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process  in  which  data  provides  the  anchor,  and  adjustments  are  made  for  "vhat*^ 
might  have  been/"^  The  latter  is  modeled  as  the  result  of  a  mental  simulation 
process  that  incorporates  the  unreliability  of  the  source  and  one's  attitude 
toward  ambiguity  in  the  circumstances.  A  two-parameter  model  of  this  process 
is  shown  to  be  consistent  with:  Keynes'  idea  of  the-^weight  of  evidence , ^the 
non-additivity  of  complementary  probabilities,  current  psychological  theories 
of  risk,  and  Ellsberg's  original  paradox.  The  model  is  tested  in  four  experi¬ 
ments  at  both  the  individual  and  group  levels.  In  experiments  1-3,  the  model 
is  shown  to  predict  judgments  quite  well;  in  experiment  4,  the  inference  model 
is  shown  to  predict  choices  between  gambles.  The  results  and  model  are  then 
discussed  with  respegj^to:  'll)  the  importance  of  ambiguity  in  assessing  per¬ 
ceived  uncertainty;  (2)  the  use  of  cognitive  strategies  in  judgments  under 
ambiguity; ''"(I)" the  role  of  ambiguity  in  risky  choice;  and/(4)  extensions  of 
the  model. 
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ambiguity  and  uncertainty  in  probabilistic  inference 


The  literature  on  how  people  make  judgments  under  uncertainty  is  large, 
conplex,  and  rife  with  controversy  (see  e.g.,  Edwards,  1954,  1968;  Peterson  6 
Beach,  1967;  Slovic  £  Lichtenstein,  1971;  Rappoport  £  Wallsten,  1972;  Slovic, 
Lichtenstein,  £  Fischhoff,  1977;  Einhorn  £  Hogarth,  1981;  Kahneman,  Slovic,  £ 
Tversky,  1982;  Cohen,  1982;  Xyburg,  1983).  One  reason  for  the  controversy  is 
that  while  there  is  agreement  that  "uncertainty"  is  a  crucial  factor  in 
inference,  there  is  much  less  agreement  about  its  meaning  and  measurement  (cf. 
Tversky  £  Kahneman,  1982).  In  particular,  while  most  psychological  work  on 
inference  has  been  guided  by  a  Bayesian  or  subjectivist  view  of  probability, 
increasing  concerns  have  been  stressed  about  this  position  (e.g.,  Cohen, 

1977;  Shafer,  1978).  Central  to  the  Bayesian  view  is  the  idea  that  prob¬ 
ability,  which  is  a  measure  of  one's  degree  of  belief,  can  be  operationalized 
via  choices  amongst  gambles  (Savage,  1954).  Thus,  if  two  gambles  have 
identical  payoffs  but  one  is  preferred  to  the  other,  it  follows  that  the 
probability  of  winning  is  greater  for  the  chosen  alternative. 

The  subjectivist  view  of  probability  gains  much  of  its  force  by  making 
expressions  of  uncertainty  operational  via  choices  amongst  gambles.  However, 
whereas  probability  is  thereby  defined  precisely,  does  this  procedure  capture 
the  essential  psychological  aspects  of  uncertainty?  In  particular,  how  valid 
is  the  assumption  that  expressions  of  uncertainty  can  be  captured  through 
choices  amongst  gambles?  An  important  and  direct  attack  on  this  assumption 
was  put  forward  by  Daniel  Ellsberg  (1961)  and  we  examine  his  arguments  below. 
In  doing  so,  however,  we  stress  that  our  intent  is  to  understand  the  psycho¬ 
logical  bases  of  uncertainty  rather  than  to  critique  the  normative  status  of 
the  Bayesian  position. 


Ellsberg  (1961)  used  the  following  example  to  show  that  the  uncertainty 
people  experience  has  several  aspects,  one  of  which  is  not  captured  in  the 
usual  betting  paradigm;  Imagine  two  urns,  each  containing  red  and  black 
balls.  In  urn  1,  there  are  100  balls  but  the  proportions  of  red  and  black  are 
unknown;  urn  2  contains  50  red  and  50  black  balls.  Now  consider  the  payoff 

matrix  shown  in  Table  1.  Note  that  if  one  bets  on  red  and  it  is  drawn  from 

the  urn,  one  gets  $100;  similarly  for  black.  However,  if  one  bets  on  the 
wrong  color,  the  payoff  is  $0.  Imagine  you  are  faced  with  having  to  decide 
which  color  to  bet  on  if  a  ball  is  to  be  drawn  from  urn  1;  i.e.,  the  choices 
are  red  (Ri),  blade  (Bi )*  or  indifference  (I).  What  about  the  same  choices  in 

urn  2;  0*2),  (B2),  or  (I)?  Most  people  are  indifferent  in  both  cases,  sug¬ 
gesting  that  the  subjective  probability  of  red  in  urn  1  is  the  same  as  the 

known  proportion  in  urn  2 — namely  .5.  However,  would  you  be  indifferent  to 
betting  on  red  if  urn  1  were  to  be  used  vs.  betting  on  red  using  urn  2  (Ri  vs. 
*2)?  Similarly,  what  about  Bi  vs.  82?  Many  people  find  that  they  prefer  R2 
over  Ri  even  though  their  indifference  judgments  within  both  urns  imply 
that,  p(Ri )  -  p(R2)  -  .5.  Furthermore,  the  same  person  who  prefers  R2  over 
Ri  may  also  prefer  B2  over  Bi •  This  pattern  of  responses  is  inconsistent 
with  the  idea  that  even  a  rank  order  of  probabilities  can  be  inferred  from 
choices.  Thus,  if  R2  is  preferred  over  Ri,  this  implies  that  p(R2* 

>  Pt^i).  Moreover,  since  red  and  black  are  complementary  events,  this  means 
that  p(B2)  <  p(B^ ) .  However,  if  B2  is  preferred  over  Bi,  then  p(B2) 

>  p(Bj),  which  contradicts  the  preceding  inequality.  It  is  also  important 
to  note  that  if  p(R2)  >  p(Ri>  and  p(B2)  >  p(Bi>»  then  either  urn  2  has 
complementary  probabilities  summing  to  more  than  1  (super-additivity),  or, 
urn  1  has  complementary  probabilities  summing  to  less  than  1  (sub-additivity). 


Although  Ellsberg  did  not  specifically  discuss  the  non-additivity  of  comple- 
nentary  probabilities  (cf.  Fellner,  1961),  we  shall  show  that  it  is  intimately 
related  to  the  effects  of  different  types  of  uncertainty  on  probabilistic 
judgments.  ' 

From  our  perspective,  the  importance  of  Ellsberg' s  paradox  lies  in  the 
difference  in  the  nature  of  the  uncertainty  between  urns  1  and  2.  In  urn  1, 
whereas  one's  best  estimate  of  the  proportion  may  be  .5,  confidence  in  that 
estimate  is  low.  In  urn  2,  on  the  other  hand,  one  is  at  least  certain  about 
the  uncertainty  in  the  urn.  While  it  may  seem  strange,  and  even  awkward,  to 
speak  of  uncertainty  as  being  more  or  less  certain  itself,  such  a  concept 
captures  an  important  aspect  of  how  people  make  inferences  from  unknown,  or 
only  partially  known,  generating  processes.  Indeed,  the  idea  of  uncertainty 
about  uncertainty  has  been  considered  from  time-to-time  under  the  rubrics, 
"second-order"  uncertainty  and  probabilities  for  probabilities  (e.g., 

Marschak,  1975) .  However,  whereas  this  concept  has  received  little  support 
amongst  subjectivist  statisticians  (see  e.g.,  de  Finetti,  1977),  its  status  as 
a  psychological  factor  of  importance  for  understanding  choice  and  inference 
has  been  demonstrated  experimentally  (Becker  &  Brownson,  1964;  Yates  & 
Zukowski,  1976).  On  the  other  hand,  the  process  by  which  such  second-order 
uncertainty  is  used  in  inference  and  the  factors  that  affect  its  use,  have  not 
been  systematically  studied.  To  be  sure,  Ellsberg  suggested  a  number  of  vari¬ 
ables  that  should  affect  the  "ambiguity"  of  a  situation,  including  the  amount, 
type,  reliability,  and  degree  of  conflict  in  the  available  information. 

Indeed,  he  stated  that. 

Ambiguity  is  a  subjective  variable,  but  it  should  be  possible 
to  identify  'objectively'  some  situations  likely  to  present 
high  ambiguity,  by  noting  situations  where  available  informa¬ 
tion  is  scanty  or  obviously  unreliable  or  highly  conflicting; 
or  where  expressed  expectations  of  different  individuals  differ 
widely;  or  where  expressed  confidence  in  estimates  tends  to  be 


low.  Thus,  as  compared  with  the  effects  of  familiar  production 
decisions  or  well-known  random  processes  (like  coin-flipping  or 
roulette),  the  results  of  Research  and  Development,  or  the  per¬ 
formance  of  a  new  President,  or  the  tactics  of  an  unfamiliar 
opponent  are  all  likely  to  appear  ambiguous.  (1961,  pp.  660- 
661 ) . 

To  specify  the  concept  of  ambiguity  more  precisely,  reconsider  the  urn 
where  the  proportion  of  red  and  black  balls  is  unknown.  From  a  Bayesian 
perspective,  this  situation  can  be  thought  of  as  one  in  which  the  judge  has  a 
diffuse  prior  over  all  possible  values  of  the  proportion,  p(R).  However, 
imagine  that  one  sampled  four  balls  (without  replacement)  and  got  3  red  and  1 
black.  Mote  that  this  result  rules  out  certain  values  of  p(R)  and  could 
change  one's  assessment  of  other  values  of  p(R).  Furthermore,  as  the  sample 
size  increases,  one  should  become  more  sure  as  to  the  actual  value  of  p(R). 
Therefore,  as  information  increases,  ignorance  (a  uniform  distribution),  gives 
way  to  ambiguity  (a  non-uniform  distribution  over  all  outcomes),  which  then 
reduces  to  a  known  p(R).  However,  while  it  is  tempting  to  equate  ambiguity 
with  some  statistical  measure  of  the  dispersion  of  the  subjective 
distribution,  this  is  unsatisfactory  for  the  following  reason:  consider  an 
urn  that  contains  either  all  red  or  all  black  balls  but  you  don't  know 
which.  In  such  a  case  we  can  characterize  the  distribution  over  p(R)  as 
having  half  its  mass  at  zero  and  half  at  one.  Note  that  the  variance  or  range 
of  this  distribution  is  high,  yet,  ambiguity  is  low.  The  reason  is  that  such 
a  distribution  rules  out  all  values  of  p(R)  other  than  0  or  1  and  is  thus 
close  to  the  case  where  ambiguity  doesn't  exist  (as  in  urn  II).  Therefore,  in 
accord  with  its  dictionary  definition,  "having  two  or  more  possible  meanings," 
ambiguity  is  a  function  of  the  number  of  alternative  parameter  values  that  are 
not  ruled  out  (or  made  implausible)  by  one's  knowledge  of  the  situation.  Note 
that  this  definition  is  similar  to,  but  not  identical  with,  statistical 
measures  such  as  variance,  range,  and  the  like. 


It  is  important  to  note  that  sample  size  is  only  one  factor  that 

influences  ambiguity  since  other  information  can  affect  the  probability 

distribution  over  the  parameter  of  a  stochastic  process.  Thus,  imagine  an  urn 

factory  where  employees  color  balls  by  throwing  them  at  two  adjacent  cans  of 

black  and  red  paint  from  a  distance  of  20  feet.  Given  our  knowledge  of  this 

process,  it  seems  fair  to  expect  that  an  urn  of  100  balls  would  not  contain 

extreme  proportions  of  red  or  black.  A  second  example,  due  to  Gardenfors  and 

Sahlin  (1982),  is  particularly  illuminating  on  this  issue: 

...  consider  Miss  Julie  who  is  invited  to  bet  on  the  outcome 
of  three  different  tennis  matches.  As  regards  match  A,  she  is 
very  well-informed  about  the  two  players  ....  Miss  Julie 
predicts  that  it  will  be  a  very  even  match  and  a  mere  chance 
will  determine  the  winner.  In  match  B,  she  knows  nothing  what¬ 
soever  about  the  relative  strength  of  the  contestants  .  .  .  and 
has  no  other  information  that  is  relevant  for  predicting  the 
winner  of  the  match.  Match  C  is  similar  to  match  B  except  that 
Miss  Julie  has  happened  to  hear  that  one  of  the  contestants  is 
an  excellent  tennis  player,  although  she  does  not  know  anything 
about  which  player  it  is,  and  that  the  second  player  is  indeed 
an  amateur  so  that  everybody  considers  the  outcome  of  the  match 
a  foregone  conclusion.  (pp.  361-362). 

Note  that  the  amount  and  type  of  information  in  the  three  situations  is  quite 
different,  as  is  the  amount  of  ambiguity  (we  would  argue  that  match  A  has  the 
least  ambiguity  and  match  B  the  most).  From  our  perspective,  how  does  the 
amount  and  type  of  ambiguity  affect  judgments  of  the  probability  of  winning  or 
losing  the  match?  Would  Miss  Julie,  for  example,  judge  that  each  player  in 
the  three  matches  has  a  .50  chance  of  winning  (or  losing)? 

Our  discussion  so  far  has  strongly  implied  that  ambiguity  is  generally 
avoided  since  it  adds  to  the  total  uncertainty  of  a  situation.  Indeed,  this 
is  explicitly  mentioned  by  Ellsberg  (1961,  p.  666)  in  discussing  why  new 
technologies  will  be  resisted  more  than  one  would  expect  on  the  basis  of  their 
first-order  probabilities.  However,  this  picture  is  not  completely  accurate, 
as  is  made  clear  by  another  example  provided  by  Ellsberg  (as  quoted  in  Becker 
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£  Brownson,  1964,  pp.  63-4,  footnote  4):  consider  two  urns  with  1000  balls 
each.  In  urn  I,  each  ball  is  numbered  from  1  to  1000  and  the  probability  of 
drawing  any  number  is  .001.  In  urn  II,  there  are  an  unknown  number  of  balls 
bearing  any  single  number.  Thus,  there  may  be  1000  balls  with  number  687,  no 
balls  with  this  number,  or  anything  in  between.  If  there  is  a  prize  for 
drawing  number  687  from  the  urn,  would  you  prefer  to  draw  from  urn  I  or  urn 
II?  Note  that  urn  I  has  no  ambiguity  and  each  numbered  ball  has  the  same  .001 
chance  of  being  drawn.  Urn  II,  on  the  other  hand,  can  be  characterized  as 
inducing  extreme  ambiguity  (i.e.,  ignorance).  However,  for  many  people,  the 
drawing  from  urn  II  seems  considerably  more  attractive  than  from  urn  I, 
thereby  implying  that  there  are  situations  in  which  ambiguity  is  preferred 
rather  than  avoided.  This  is  considered  in  detail  later,  but  we  note  here 
that  accounting  for  such  shifts  is  an  important  criterion  for  judging  the 
adequacy  of  any  theory  of  inference  under  ambiguity. 

Finally,  the  concepts  of  ambiguity,  second-order  uncertainty,  and  the 

like,  have  been  of  concern  in  theories  of  inference  quite  apart  from  their 

role  in  affecting  choice.  For  example,  work  on  fuzzy  sets  (Zadeh,  1978), 

Shafer's  theory  of  evidence  (1976),  Cohen's  (1977)  attempt  to  formalize 

uncertainty  in  legal  settings,  and  the  elicitation  of  probability  ranges 

(ffallsten,  Forsyth,  £  Budescu,  1983),  all  contain  ideas  concerning  the 

vagueness  that  can  underly  probabilities.  Indeed,  statisticians  have  provided 

axiomatic  systems  for  trying  to  formalize  probability  ranges  and  rank  orders 

rather  than  specific  values  (e.g.,  Koopman,  1940).  Moreover,  early  work  by 

Keynes  (1921)  also  addressed  the  notion  of  ambiguity  by  distinguishing  between 

probability  and  what  he  called  the  "weight  of  evidence."  He  stated: 

The  magnitude  of  the  probability.  .  .depends  upon  a  balance 
between  what  may  be  termed  the  favourable  and  the  unfavourable 
evidence;  a  new  piece  of  evidence  which  leaves  this  balance 
unchanged,  also  leaves  the  probability  of  the  argument  unchanged. 


But  it  seems  that  there  may  be  another  respect  in  which  some  kind 
of  quantitative  comparison  between  arguments  is  possible.  This 
comparison  turns  upon  a  balance,  not  between  the  favourable  and 
unfavourable  evidence,  but  between  the  absolute  amounts  of 
relevant  knowledge  and  of  relevant  ignorance  respectively. 

(Keynes,  1921,  p.  71,  original  emphasis). 

Plan  of  the  Paper 

We  first  examine  the  underlying  structure  of  a  set  of  problems  in 
which  ambiguity  is  a  major  factor  and  note  how  this  structure  differs  from 
unambiguous  situations.  We  then  devote  the  major  part  of  the  paper  to  the 
development  and  testing  of  a  descriptive  model  of  how  people  make  probability 
judgments  and  choices  under  varying  amounts  of  ambiguity.  The  model  is  tested 
in  four  experiments  at  both  the  aggregate  and  individual  subject  levels.  The 
implications  of  the  theory  and  empirical  work  are  then  discussed  in  relation 
to:  (a)  the  importance  of  ambiguity  in  assessing  perceived  uncertainty; 

(b)  the  use  of  cognitive  strategies  in  understanding  probabilistic  judgments 
under  ambiguity;  (c)  the  role  of  ambiguity  in  risky  choice;  and  (d)  extensions 
of  the  model  to  multiple  sources  and  time  periods. 

A  Model  for  Studying  Ambiguity  in  Inference 

The  prototypical  inference  that  we  consider  involves  a  judge  assessing 
the  likelihood  of  the  occurrence  of  an  event  based  on  reports  received  from 
a  source  of  limited  reliability.  The  task  can  be  thought  of  as  having  the 
elements  schematically  represented  in  Figure  1.  (1)  An  event  occurs; 

(2)  The  event  is  "sensed"  by  observers  (e.g.,  witnesses  to  an  accident)  who, 

l55®]-£_?Isy2:®_l_®b22£_d®E® 

in  principle,  can  be  characterized  by  levels  of  sensitivity  and  bias.  How¬ 
ever,  it  is  important  to  emphasize  that  these  levels  are  unknown  to  the  judge 
(see  5  below);  (3)  The  observers  report  what  they  saw.  We  denote  A*  as  the 


Figure  1.  Structure  of  the  inference  task 
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report  of  event  A,  and  B*  as  the  report  of  event  B,  where  the  decision  rule 
is  to  report  A*  if  the  observation  is  above  some  critical  value  Xc,  and 
B*  otherwise.  The  reports  can  therefore  be  conceptualized  as  coming  from  a 
signal-detection  task;  (4)  Since  there  are  n  observers,  n  reports  are 
collected.  Thus,  the  n  reports  can  be  thought  of  as  the  outcomes  of  n 
observers  reporting  on  a  single  trial  of  a  signal  detection  task.  Further¬ 
more,  since  we  do  not  differentiate  between  the  n  observers,  we  refer  to 
them  as  coming  from  a  single  source;  (5)  The  judge  receives  the  information  in 
the  form  of  f  reports  for  a  hypothesis  (i.e.,  f  reports  of  A*)  and  c 
reports  of  an  alternative  (i.e.,  c  reports  of  B*),  where  f+c  **  n,  and  p  * 
f/n.  The  content  of  the  scenario,  however,  is  assumed  to  give  the  judge  some 
information  as  to  what  values  of  p  to  expect  in  a  sample  of  size  n. 
Specifically,  we  argue  that  expectations  concerning  p  will  be  influenced  by, 
(a)  the  dissimilarity  between  events  A  and  B;  and  (b)  the  credibility  of  the 
source.  By  "credibility"  we  mean  the  sensitivity  and  response  bias  of  the 
observers  in  judging  the  particular  events  of  interest.  For  example,  imagine 
that  you  are  a  detective  investigating  a  bank  robbery  where  two  witnesses 
claim  that  the  robber  has  blond  hair  and  one  witness  claims  it  is  brown.  How 
likely  does  the  robber  have  blond  hair?  While  the  detective  knows  neither  the 
hit  and  false  alarm  rates  of  the  witnesses,  nor  their  response  bias  for  saying 
"blond"  vs.  "brown,"  he  may  know  something  about  the  quality  of  eye-witnesses 
in  a  robbery,  the  conf usability  of  blond  and  brown  in  the  circumstances,  and 
perhaps  something  about  the  motivation  of  the  witnesses.  How  contrast  this 
situation  where  the  source  is  two  color  television  cameras  that  were  filming 
the  robbery  at  the  bank.  Whereas  in  the  former  case  the  detective  would 
expect  the  reports  to  conflict  (i.e.,  0  <  p  <  1) ,  in  the  latter  it  would  be 
surprising  if  p  were  not  equal  to  either  0  or  1. 


Note  that  in  Figure  1,  we  have  represented  the  judge's  expectations  by 
three  different  distributions.  In  distribution  (1),  the  information  about  the 
credibility  of  the  source,  the  dissimilarity  of  the  signals,  and  the  size  of 
the  sample,  does  not  rule  out  many  values  of  p.  This  is  a  highly  ambiguous 
situation  and  would,  for  example,  characterize  the  detective  trying  to  judge 
evidence  from  witnesses.  Distribution  (2)  characterizes  expectations  based  on 
a  highly  credible  source  that  discriminates  between  dissimilar  signals;  e.g., 
evidence  from  cameras  filming  the  robbery.  We  believe  that  ambiguity  is  low 
here  since  our  knowledge  of  the  process  that  generates  evidence  rules  out  most 
values  of  p.  Distribution  (3)  also  represents  a  situation  of  low  ambiguity, 
but  it  is  quite  different  from  (2).  Indeed,  (3)  is  likely  to  result  when  the 
credibility  of  the  source  is  particularly  'ow  and/or  the  signals  are  very 
similar,  in  direct  opposition  to  the  conditions  that  produce  (2).  For 
example,  imagine  a  taste-test  between  Pepsi  vs.  Coke  for  randomly  chosen 
shoppers.  If  we  believe  that  the  two  drinks  have  a  very  similar  taste  and 
tnat  most  shoppers  are  not  able  to  tell  the  difference,  we  would  expect  the 
proportion  of  reports  for  either  product  to  be  around  .5.  Thus,  results  from 
such  a  test  might  be  seen  as  most  closely  resembling  the  drawing  of  balls  from 
an  urn  with  known  p  »  .5.  It  is  interesting  to  note  that  whereas  some 
authors  have  equated  increased  reliability  of  evidence  with  less  ambiguity  (as 
suggested  by  Ellsberg,  for  example),  distribution  (3)  shows  that  decreased 
reliability  can  also  lead  to  low  ambiguity.  Another  way  to  express  this  is  to 
note  that  high  reliability  implies  low  ambiguity  (distribution  (2)),  but  low 
ambiguity  does  not  imply  high  reliability  (since  distribution  (3)  could  be 
involved).  As  we  will  shew  later,  both  the  amount  and  type  of  ambiguity 
(distributions  (1),  (2),  or  (3))  affect  how  probabilistic  judgments  are  made; 
(6)  The  judge  combines  the  information  from  the  reports  with  expectations 


about  p  to  reach  an  assessment  of  the  likelihood  of  A. 

The  structure  of  this  task  is  both  similar  to  and  different  from  several 
probabilistic  models  of  the  inference  process.  First,  it  is  similar  to 
cascaded  inference  in  that  the  judge  is  making  inferences  about  an  event 
on  the  basis  of  unreliable  reports  (cf.  Schum  &  Kelley,  1973;  Schum,  1980). 
However,  in  contrast  to  studies  of  cascaded  inference,  the  judge  does  not  know 
the  precise  value  of  the  source's  reliability;  rather,  there  is  ambiguity 
concerning  what  this  is. 

Second,  since  each  observer  can  be  thought  of  as  participating  in  the 
same  signal  detection  task,  the  reports  not  only  reflect  their  sensitivity  to 
competing  signals,  but  also  their  bias  due  to  differential  payoffs.  However, 
as  recently  emphasized  by  Bimbaum  (in  press),  the  manner  in  which  the  judge 
treats  the  observer  reports  depends  on  some  theory  about  the  observers.  For 
example,  the  observer  reports  could  be  responsive  to  the  prior  probabilities 
of  A  and  B  as  well  as  to  differential  payoffs.  He  emphasize  that  in  our  task 
the  judge  is  not  given  precise  information  about  these  matters.  Furthermore, 
since  the  judge  only  receives  information  on  a  single  trial,  the  observers' 
hit-rate  and  false-alarm  rate  are  not  known.  Instead,  the  observed  p,  and 
the  judge's  expectations  about  p,  become  cues  to  the  likelihood  that  the 
event  occurred. 

Third,  one  might  consider  our  situation  as  a  conventional  Bayesian 
revision  task  (cf.  Edwards,  1968).  However,  the  explicit  probabilities 
necessary  to  assess  the  likelihood  functions  are  not  provided;  and,  no  base- 
rate  data  or  prior  probabilities  are  stated.  It  would,  of  course,  be  possible 
to  provide  the  judge  with  explicit  prior  probabilities.  This  would,  however, 
be  extending  our  paradigm  to  one  where  multiple  sources  of  information  need  to 
be  combined,  i.e.,  base-rates  and  individuating  information.  For  the  sake  of 


simplicity,  we  only  consider  the  effects  of  ambiguity  on  inferences  from  a 
single  source  and  thus  do  not  discuss  the  effects  of  explicit  base  rates 
(extensions  of  our  model  to  multiple  sources  is  considered  in  the  Discussion 
section) . 

Our  intent  above  has  been  to  show  how  our  task  is  both  similar  to,  and 
different  from,  formal  models  of  probabilistic  inference.  In  addition,  we 
note  that  that  although  the  inference  task  we  consider  is  quite  common,  it 
is  difficult  to  describe  it  formally  when  uncertainty  cannot  be  represented 
by  known  probabilities.  Be  that  as  it  may,  our  purpose  is  to  develop  a 
descriptive  model  of  how  inferences  under  ambiguity  are  made,  and  it  is  to 
this  that  we  now  turn. 

A  Descriptive  Model 

We  propose  that  in  making  judgments  under  ambiguity,  people  use  an 
anchoring  and  adjustment  strategy  in  which  the  data  (reports)  serve  as  the 
anchor,  and  adjustments  reflect  both  the  amount  and  type  of  ambiguity  in  the 
situation.  We  begin  by  assuming  that  one  has  received  n  reports  from  some 
source,  with  f  reports  "for"  a  particular  hypothesis  and  c  reports  "con" 

(n  -  f+c).  When  the  judge  is  asked  how  likely  it  is  that  event  A  occurred 
(or,  hypothesis  A  is  true),  it  is  assumed  that  the  proportion  of  reports  for  A 
is  used  as  the  anchor  (i.e.,  f/n).  Note  that  if  the  question  were  reversed, 

i.e.,  how  likely  is  it  that  B  occurred,  the  anchor  would  be  c/n.  To  model 
the  adjustment  process,  we  posit  that  people  engage  in  a  mental  simulation  or 
subjective  sensitivity  analysis  (cf.  Fischhoff,  et  al.,  1980;  Kahneman  fi 
Tversky,  1982)  in  which  outcomes  that  might  have  happened  are  imagined  and 
used  for  adjusting  the  anchor. 

We  model  this  process  in  the  following  way:  let  S(f:c)  be  the  judged 
likelihood  that  some  event  occurred  (or  some  hypothesis  is  true)  on  the  basis 


of  f  reports  for  and  c  reports  against.  Furthermore,  let  k  be  the 
adjustment  factor,  which  results  as  a  net  effect  of  simulating  both  greater 
and  smaller  values  of  the  observed  p.  Thus, 

S<f :c)  -  p  +  k  (1 ) 

To  illustrate  the  adjustment  process,  imagine  3  reports  from  witnesses  in 
which  2  claim  that  A  occurred  and  1  claims  it  was  B.  The  judged  likelihood  of 
A  is  equal  to  2/3  plus  an  adjustment  that  reflects  the  unreliability  of  the 
reports  and  the  type  of  ambiguity  in  the  situation.  The  simulation  process  is 
assumed  to  involve  the  values  of  p  that  might  have  occurred,  but  didn't: 

3/3,  1/3,  0/3.  Clearly,  the  more  unreliable  the  reports,  the  more  credence  is 
given  to  the  simulation  values  as  opposed  to  the  observed  data.  Moreover,  the 
simulation  is  "cons trad. ned"  by  one's  prior  expectations  as  to  the  plausible 
values  of  p  that  are  likely  in  this  situation  (recall  box  S  in  Figure  1). 
Therefore,  we  conceive  of  the  simulation  as  reflecting  the  reliability  of  the 
data,  which  is  due  to  the  credibility  of  the  source  and  the  dissimilarity  of 
the  signals,  and  the  type  of  ambiguity  in  the  situation. 

The  Simulation  Process 

Since  S(f:c)  varies  between  0  and  1,  equation  (1)  implies  that  k  must 
be  constrained  as  follows: 

-  p  <  k  <  1  -  p  (2) 

From  a  psychological  viewpoint,  this  means  that  the  direction  of  the 
adjustment  must  be  due,  in  part,  to  the  value  of  p.  Indeed,  when  p  *  0, 
k  >  0,  and  the  adjustment  (if  there  is  one)  must  be  upwards;  when  p  ■  1, 
k  <  0,  so  that  the  adjustment  must  be  downwards.  When  p  *  0,  1,  one  can 
imagine  greater  and  smaller  p's,  but  the  numbers  of  each  are  constrained  by 
the  particular  value  of  the  observed  p.  In  order  to  model  the  simulation 


process,  we  consider  k  to  be  the  net  effect  of  the  difference  between  the 
number  of  greater  and  smaller  p's?  specifically, 

*  -  (*g  -  *g)/n  (3) 

where,  kg  ■  number  of  greater  p's  used  in  the  simulation 

ks  -  number  of  smaller  p's  used  in  the  simulation 
n  ”  total  number  of  reports 

The  difference,  kg  -  Jts,  is  divided  by  n  since  kg  and  ks  are  numbers 
of  cases,  while  k  must  satisfy  equation  (2).  To  illustrate  how  (3)  works, 
reconsider  the  example  of  evaluating  S(2:1).  Note  that  there  is  one  case  of 
greater  p,  (3/3),  and  two  of  smaller,  (1/3,  0/3).  If  the  judge  "uses"  all 
three  cases  and  weights  greater  and  smaller  cases  equally  (both  of  these 
issues  will  be  discussed  below),  k  ■  -  1/3  and  the  anchor  of  2/3  would  be 
adjusted  downwards  to  1/3. 

In  equation  (3),  kg  and  kg 

are  defined  as  the  number  of  cases  used  in 
the  simulation  rather  than  the  maximum  number  that  could  be  used.  These 
latter  values  set  an  upper  limit  on  the  simulation  and  we  consider  them 
first.  Thereafter,  we  discuss:  (a)  how  the  unreliability  of  the  data  affects 
the  simulation;  and,  (b)  the  incorporation  of  differential  weighting  for 
smaller  vs.  greater  values  of  p.  First,  consider  the  constraints  imposed 
on  kg  and  ks  by  the  observed  p.  Specifically,  as  p  increases,  the 
maximum  value  of  kg  decreases  and  the  maximum  value  of  ks  increases.  In 
fact,  these  values  can  be  written  as, 


k  (max)  “  n(1-p) 
9 

k  (max)  “  np 

s 


(4) 


For  example,  if  one  had  evidence  of  (7:3),  kg(max)  -  3  (consisting  of  8/10 
9/10,  10/10)  and  kg^®4*)  ■  However,  as  pointed  out  earlier,  the  amount 
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that  one  simulates  will  be  related  to  the  perceived  reliability  of  the  data. 
To  incorporate  this  into  the  simulation  process,  let  the  parameter  3 
represent  the  unreliability  of  the  reports  received  from  the  source  (larger 
values  of  6  indicating  greater  unreliability).  However,  since  increasing 
the  amount  of  evidence  (n)  decreases  unreliability,  the  overall  effect  of 
unreliability  of  the  reports  can  be  expressed  by, 

OR  -  0/n  (0  <  n)  (5) 

where,  UR  -  overall  unreliability  of  the  data  (0  <  UR  <  1 ) 

0  *  parameter  reflecting  the  lack  of  credibility  of  the  source  and 
dissimilarity  of  the  signals 
n  •  number  of  reports 

We  can  now  consider  the  kg  and  ks  used  in  the  simulation  as  reflect¬ 
ing  the  maximum  values  of  each  as  weighted  by  the  overall  unreliability  of  the 
data;  specifically, 

k  -  —  n ( 1 -p )  -  0  (1-p)  (5a) 

g  n 

k  «  —  np**0p  (5b) 

s  n 

Thus,  if  the  source  were  perfectly  reliable,  0  *  0  and  there  would  be  no 
effect  for  the  simulation.  Clearly,  as  0  increases,  the  range  of  values 
used  in  the  simulation  also  increases. 

Up  to  this  point,  we  have  treated  greater  and  smaller  values  of  p  as 
having  equal  weight  or  importance  in  the  simulation.  This  is  now  rectified  by 
introducing  our  second  parameter,  0,  which  we  call  one's  "attitude  toward 
ambiguity  in  the  circumstances."  Since  k  is  the  net  effect  of  both  kg 
and  ka>  0  only  needs  to  affect  one  of  these  components  for  there  to  be  a 


differential  weighting  effect  on  k.  Thus,  we  redefine  ks  as, 

kg  -  0pB  (B  >  0)  (6) 

We  now  substitute  (6)  and  (5a)  into  (3)  to  get, 


To  see  the  implications  of  (7),  the  relations  between  k,  p,  and  B  are 
illustrated  in  Figure  2.  (In  Appendix  A,  we  consider  alternative  models  that 
result  from  different  weighting  assumptions.)  First  note  that  k  reaches  its 
maximum  value  of  0/n  when  p  ■  0  (i.e.,  where  all  "might  have 
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beens"  must  be  positive),  and  its  minimum  of  -0/n  when  p  ■  1  (all  "might 
have  beens"  must  be  negative).  Moreover,  B  plays  no  role  when  p  »  0,  1, 
since  differential  weight  for  imagined  values  of  p  is  not  an  issue.  Second, 
the  figure  shows  the  effects  of  different  levels  of  B  on  k;  B  >  1  (more 
weight  for  kg  than  ka)j  B  "  1  (equal  weight  for  kg  and  ks)*  »nd 
B  <  1  (®ore  weight  for  kg  than  kg).  An  important  implication  of 
different  values  of  B  is  that  they  affect  the  value  of  p  for  which  there 
is  no  adjustment  (i.e.,  k  -  0).  Thus,  for  B  *  1,  k  >  0  when  p  <  .5, 
and  k  <  0  when  p  >  .5.  In  other  words,  a  person  with  B  *  1  will  have 
upward  adjustments  for  p  <  .5,  downward  adjustments  for  p  >  .5,  and  no 
adjustment  for  p  -  .5.  When  B  <  1,  the  point  of  no  adjustment,  called  the 
"cross-over  point"  and  denoted  pc,  occurs  below  p  -  .5;  for  B  >  1,  the 
cross-over  point  is  above  p  ■  .5.  In  the  presence  of  ambiguity,  we  expect 
people  to  be  generally  conservative  and  to  give  more  weight  to  the  possible 
values  below  p  than  to  those  above  it.  Thus,  we  consider  B  <  1  to  be 
typical  of  assessments  made  under  high  ambiguity.  Conversely,  as  ambiguity 
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decreases  we  would  also  expect  people  to  weight  possible  values  below  and 
above  p  more  symmetrically. 

Given  the  specification  of  k  in  (7),  the  full  model  is  obtained  by  sub 
stituting  (7)  into  (1).  This  is, 

S<f:c)  ■  p  +  —  (t-p-p®)  (8) 

n 

The  model  in  equation  (8)  has  several  implications:  (1)  Consider  the  effect 
of  the  amount  of  information  (n)  on  judged  likelihood.  Note  that  S  ♦  p 
as  n  ♦  «■.  This  means  that  as  the  amount  of  information  increases,  one 
becomes  more  certain  as  to  the  diagnosticity  of  the  data.  It  is  important  to 
realize  that  as  n  ♦  «•,  S  does  not  go  to  0  or  1  as  would  be  implied  by  a 
standard  Bayesian  revision  model.  Instead,  the  fact  that  S  asymptotes  at 
p  parallels  an  analogous  result  in  cascaded  inference  where,  under  certain 
symmetry  assumptions,  the  maximum  probability  of  a  hypothesis  is  bounded  by 
the  reliability  of  the  reporting  source  (Schum  &  Du Chorine,  1971). 

(2)  Conditional  on  a  given  value  of  6,  the  model  implies  that  there 
will  be  trade-offs  between  p  and  n  in  determining  judged  likelihood.  For 
example,  one  night  find  the  evidence  in  favor  of  some  hypothesis  to  be  more 
convincing  on  the  basis  of  (9:1)  than  (2:0).  However,  because  S  asymptotes 
at  p,  trade-offs  of  p  and  n  will  only  occur  at  small  values  of  n.  More 
generally,  the  model  involves  trade-offs  between  four  factors:  p,  n,  0,  anc 
0.  This  is  illustrated  in  Figure  3  which  shows  how  S  is  "regressive”  with 

respect  to  p.  First,  consider  the  left-hand  panel  where  pc  j.s  below  .5. 
For  given  0,  the  line  aa'  is  determined  by  0/n.  However,  the  line  aa' 
becomes  bb’  if  either  0  is  made  smaller  or  n  is  increased.  That  is,  0 
and  n  trade-off.  Second,  consider  the  right-hand  panel  in  which  only  one 


parameter  has  been  changed;  pc  is  now  above  .5.  At  the  extremes  (p  =  0,  1), 
S  is  unaffected  by  0.  However,  for  0  <  p  <  1,  0  has  a  direct  impact  on 
S  in  that  S  increases  with  0.  Furthermore,  note  that  the  effect  of  0 
and  n  on  Judged  likelihood  is  considerably  reduced  for  values  of  p  close 
to  pc. 

(3)  What  happens  when  someone  judges  the  likelihood  of  two  complementary 
events?  The  sum  of  these  judgments  is  given  by, 

S(f  sc)  +  S  (c:  f )  *  {p  +  ^  +  {  ( 1  — p)  +  tp  ~  (1-p)*3]} 

-1+|  [1-pS-  (1-p)6]  (9) 

Equation  (9)  specifies  precisely  when  judgments  of  complementary  probabilities 
are  additive  (i.e.,  sum  to  1 ) .  Specifically,  this  occurs  when  either  0=0, 
p-0,  1,  or  0-1.  Moreover,  as  n  +  ",  the  sum  of  judgments  of  comple¬ 
mentary  events  approaches  1..  If  the  preceding  conditions  do  not  hold,  the 
amount  of  non-additivity  is  directly  related  to  0,  and  the  type  of  non¬ 
additivity  depends  on  0  and  its  implied  pc*  Specifically,  if  0  <  i, 
complementary  judgments  will  be  sub-additive  (i.e.,  sum  to  less  than  1)  since 
[1-p®  -  (1-p)®]  <  0.  However,  0  >  1  implies  super-additivity  since 
[1-pp  -  (1-pr  ]  >  0.  The  importance  of  equation  (9)  is  that  it  makes  strong 
predictions  as  to  when  sub-  or  super-additivity  will  occur,  as  well  as  the 
extent  of  these  effects.  Moreover,  these  phenomena  depend  on  the  reliability 
of  the  data  as  captured  via  @  and  individual  attitudes  toward  ambiguity  in 
the  circumstances,  i.e.,  0.  Indeed,  as  Ellsberg's  (1961)  work  demonstrated, 
ambiguity  can  lead  to  probabilities  of  complementary  events  that  are  non¬ 


additive 
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The  above  implications  deal  directly  with  inference.  However,  it  is 
difficult  (and  may  not  be  desirable)  to  discuss  probability  judgments  without 
considering  their  relation  to  choices  under  uncertainty.  Indeed,  we  began 
this  paper  by  discussing  Ellsberg's  paradox  and  stated  that  any  theory  of 
inference  under  ambiguity  must  explain  Ellsberg's  original  result  and  his 
later  example  demonstrating  ambiguity  preference.  In  order  to  do  so,  we 
derive  a  similar  expression  to  equation  (8)  for  capturing  the  effects  of 
ambiguous  probability  assessments  on  choice.  We  begin  by  defining  s(pA) 
as  an  assessed  probability  made  in  an  ambiguous  situation  (e.g.,  probabilities 
assessed  on  red  and  black  in  Ellsberg's  urn  I).  Furthermore,  we  assume  that. 


S(pA)  -  PA  +  * 


(10) 


where  pA  is  a  value  on  which  the  judge  anchors  (this  could  be  self -gene rated 
or  given  by  another;  e.g.,  in  a  gambling  task),  and  k  is  the  net  effect  of 
the  adjustment  for  ambiguity.  Thus,  S(pA)  is  the  result  of  an  anchor,  pA, 
and  an  adjustment  process  that  reflects  the  ambiguity  in  the  situation.  As 


discussed  previously,  k  can  be  decomposed  into  kg  and  ks»" 


(11) 


where,  m  *  total  number  of  values  of  p  that  could  be  considered 
in  the  simulation. 

Denote  6  as  the  amount  of  ambiguity  in  the  situation  and, 


k  *0  ra(l-p) 

g 

8 

k  ■  0  mp 
s 


(12) 


Equation  (12)  simply  recapitulates  the  inclusion  of  an  ambiguity  weight  0, 
and  the  parameter  8,  which  reflects  the  differential  weight  for  greater  and 
smaller  p's.  When  (12)  is  substituted  into  (11)  and  (10),  we  obtain, 
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S(pj  =  PA  +  ®(l-PA-  PAS)  (13) 

Equation  (13)  parallels  equation  (8),  except  that  n  no  longer  plays  any 
role. 

To  show  how  (13)  can  explain  the  Ellsberg  results,  consider  Figure  4, 
which  shows  S(pA)  as  a  function  of  pA  three  separate  pairs  of  values 

of  9  and  8.  Consider  (4a),  where  0  >  0  and  8  <  1.  Mote  that  a  person 

with  parameter  values  in  these  ranges  will  "underweight"  all  pA  above  pc, 
and  "overweight"  pA  <  pc*  This  particular  pattern  explains  why  most  people 
in  Ellsberg' s  urn  example  choose  the  unambiguous  urn  II;  that  is,  S(pA  =  .5) 

<  .50.  However,  note  that’ if  pA  is  less  than  pc,  S(pA)  >  pA  and  one 
would  expect  the  same  person  who  avoided  the  ambiguous  urn  when  Pj^  “  .5,  to 
prefer  the  ambiguous  urn  when  pA  is  sufficiently  low.  The  pattern  of  over¬ 
weighting  small  ^  and  underweighting  moderate-to- large  pA  also  accounts 
for  some  otherwise  puzzling  results  of  Goldsmith  and  Sahlin  (as  reported 
Gardenfors  S  Sahlin,  1982).  They  presented  subjects  with  descriptions  or 
either  well-known  events  (e.g.,  drawing  cards  from  a  standard  deck),  or  events 
about  which  the  subjects  had  little  knowledge  (e.g.,  the  likelihood  of  a  bus 
strike  in  Verona,  Italy  next  week).  Subjects  estimated  the  probabilities  of 
the  events  and  the  perceived  reliability  of  their  probability  estimates. 

Events  with  equal  probabilities  but  unequal  reliabilities  were  then  used  in  a 
lottery  set-up.  The  authors  report  that, 

...  for  probabilities  other  than  fairly  low  ones,  lottery 
tickets  involving  more  reliable  probability  estimates  tend  to  be 
preferred.  (Gardenfors  &  Sahlin,  1982,  p.  363,  our  emphasis.) 

While  the  pattern  shown  in  Figure  4a  accounts  for  much  data,  it  does  not 
explain  why  some  people  in  the  Ellsberg  task  prefer  to  bet  on  drawing  from  the 
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ambiguous  urn  when  Pa  *  .5.  However,  consider  a  person  with  an  S(pa^ 
function  as  shown  in  Figure  4b.  When  0  >  0  and  (5  >  1 ,  we  get  “ambiguity 

preference"  over  most  of  the  range  of  pa*  Thus,  when  pa  <  pc,  S(pa)  >  Pa 
and  over-weighting  occurs;  when  pA  >  Pc,  S(pa)  <  Pa  and  underweighting 
occurs.  Since  individual  differences  are  rarely  accounted  for  in  research  on 
decisions  under  uncertainty,  our  model  has  the  distinct  advantage  of  positing 
a  general  psychological  process  while  allowing  for  individual  differences  via 
particular  parameter  values.  Indeed,  this  is  nicely  illustrated  by  consider¬ 
ing  people  who  are  indifferent  between  gambles  from  ambiguous  and  unambiguous 
urns  when  p^  ■  .5  (as  in  the  Ellsberg  case).  Our  model  suggests  two 
distinct  types;  those  for  whom  0*0,  and  thus,  S(pa)  ■  Pa;  and  those  for 
whom  0  >  0  and  0  «  1  (shown  in  Figure  4c).  This  latter  group  does  not 

adjust  at  p^  *  .5,  but  does  adjust  at  all  other  values.  Therefore,  people 
characterized  by  these  parameter  values  will  only  be  indifferent  between 
lotteries  at  .5. 

Finally,  the  model  in  (13)  is  relevant  to  the  major  psychological  theory 

that  examines  risk;  namely,  "prospect  theory"  (Kahneman  &  Tversky,  1979). 

From  our  perspective,  the  treatment  of  uncertainty  in  prospect  theory  is 

consistent  with  our  approach  since  a  decision-weight  function  is  posited  that 

is  remarkably  similar  to  the  S(pA)  function  shown  in  Figure  4a.  This  is  not 

a  coincidence  since,  as  Kahneman  and  Tversky  specifically  point  out,  decision 

weights  can  be  affected  by  ambiguity.  Indeed,  they  state. 

The  decision  weight  associated  with  an  event  will  depend 
primarily  on  the  perceived  likelihood  of  that  event,  which 
could  be  subject  to  major  biases.  In  addition,  decision 
weights  may  be  affected  by  other  considerations,  such  as 
ambiguity  or  vagueness.  Indeed,  the  work  of  Ellsberg  and 
Fellner  implies  that  vagueness  reduces  decision  weights,  (p. 

289) 

While  our  equation  (13)  could  be  made  fully  compatible  with  the  decision- 
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weight  function  of  prospect  theory  (by  restricting  its  applicability  to 
0  <  p  <  1  and  thereby  not  defining  the  end  points),1  we  wish  to  emphasize 
that  (13)  expresses  a  class  of  functions.  Therefore,  while  the  decision- 
weight  function  of  prospect  theory  expresses  a  general  tendency  to  treat 
uncertainty  in  a  particular  way,  (13)  allows  for  individual  differences  in  the 
handling  of  uncertainty. 

EXPERIMENTAL  TESTS  OF  THE  MODEL 

To  test  our  model  empirically,  we  employed  a  direct  inference  task 
(ejqperiments  1-3)  and  one  task  dealing  with  choice  (experiment  4).  In  the 
direct  task,  people  were  asked  to  make  probability  judgments  on  the  basis  of 
numbers  of  reports  from  a  source.  In  experiment  1,  we  examined  the  various 
implications  of  equation  (8)  by  considering  whether  S(f:c)  asymptotes  at 
p;  whether  the  various  parameter  values  are  consistent  with  the  additivity/ 
non-additivity  of  complementary  events,  and  so  on.  In  experiment  2,  the  model 
was  tested  in  different  content  scenarios,  in  order  to  generalize  the  results 
from  e}q?eriment  1 .  In  experiment  3,  scenarios  that  varied  in  the  credibility 
of  the  source  and  the  dissimilarity  of  the  signals  were  used.  These  allowed 
us  to  investigate  the  effects  of  the  overall  reliability  of  the  source  on  the 
parameter  values  of  the  model.  In  addition,  the  consistency  of  individual 
differences  in  strategy  (as  measured  by  a  person's  0  and  8  parameters) 
was  also  considered.  The  choice  experiment  involved  an  attempt  to  answer  the 
question:  Can  an  individual's  choices  between  gambles  be  predicted  from 
knowledge  of  his  or  her  @  and  8  parameters  obtained  from  a  separate 
inference  task?  We  now  turn  to  experiment  1 . 
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Experiment  1 

Subjects.  Thirty-two  subjects  were  recruited  through  an  ad  in  the 
University  newspaper  which  offered  an  hour  for  participation  in  an 
experiment  on  judgment.  The  median  age  of  the  subjects  was  24,  their 
educational  level  was  high  (mean  of  4.4  years  of  formal  post-high  school 
education),  and  there  were  16  males  and  16  females. 

Stimuli.  The  stimuli  consisted  of  a  set  of  scenarios  that  involved  a 

hit-and-run  accident  seen  by  varying  numbers  of  witnesses.  Moreover,  of  the 

n  witnesses  to  the  accident,  f  claimed  that  it  was  a  green  car  while  c 

claimed  it  was  a  blue  car.  A  typical  scenario  was  phrased  as  follows: 

An  automobile  accident  occurred  at  a  street  corner  in  down¬ 
town  Chicago.  The  car  that  caused  the  accident  did  not  stop 
but  sped  away  from  the  scene.  Of  the  n  witnesses  to  the 
accident,  f  reported  that  the  color  of  the  offending  car 
was  green,  whereas  c  reported  it  was  blue.  On  the  basis  of 
this  evidence,  how  likely  is  it  that  the  car  was  green? 

Each  scenario  was  printed  on  a  separate  page  and  contained  a  0-100  point 
rating  scale  that  was  used  by  the  subject  to  judge  how  likely  the  accident  was 
caused  by  a  particular  colored  car.  Each  stimulus  contained  the  same  basic 
story  but  varied  in  the  total  number  of  witnesses  (n),  the  number  saying  it 
was  a  green  (f)  or  a  blue  car  (c),  and  whether  one  was  to  judge  the  like¬ 
lihood  that  the  majority  or  minority  position  was  true.  In  order  to  sample  a 
wide  range  of  values  of  n  and  p,  40  combinations  were  chosen  as  follows: 


for 

P  - 

1,  n 

-  2, 

6,  12 

,  20;  p  “  .89,  n 

■  9,  18,  27;  p  “  .80, 

n  *»  5,  10,  15, 

20, 

25; 

P  “  ■ 

.75, 

n  *  4; 

p  “  .67,  n  »*  3, 

6,  9,  12,  15,  18,  24; 

p  *  .60,  n  ■  5, 

10; 

P  " 

.50, 

n  ■ 

2,  8, 

12,  20;  p  -  .40, 

n  ■  5,  10;  p  **  .33,  n 

-  6,  9,  18;  p 

-  .25,  n  **  4;  p  *  .20,  n  ■  5,  10;  p  ■  .11,  n  *  9,  18;  p  *■  0,  n  =  2,  6,  12,  20. 
Zn  addition,  8  stimuli  were  given  twice  to  ascertain  test-retest  reliability. 
Thus,  the  total  number  of  stimuli  was  48,  and  they  were  arranged  in  booklet 


3 


Procedure 

When  the  subjects  entered  the  laboratory,  they  were  told  that  the 
experiment  involved  making  inferential  judgments.  Furthermore,  it  was  stated 
that  if  they  did  well  in  the  experiment  (without  specifying  what  this  meant), 
it  was  likely  that  they  would  be  called  for  further  experiments.  Given  the 
relatively  high  hourly  wage,  this  was  thought  to  provide  some  incentive  to 
take  the  task  seriously.  In  order  to  avoid  boredom  and  to  reduce  the  trans¬ 
parency  that  judgments  of  complementary  events  were  sometimes  required, 
subjects  were  given  4  sets  of  12  stimuli  and,  after  completing  each  set,  they 
performed  a  different  task.  All  stimuli  were  randomly  ordered  within  the  four 
sets.  Subjects  could  take  as  much  time  as  they  needed  and  they  were  free  to 
make  as  many  (or  as  few)  calculations  as  they  wished.  After  completing  the 
task,  all  subjects  filled  out  a  questionnaire  regarding  various  demographic 
variables. 

Estimating  the  Model 

To  estimate  the  model  from  the  experimental  data,  we  need  to  re-write 
equation  (8)  and  include  a  random  error  term  to  represent  judgmental  incon¬ 
sistency;  therefore, 

S(f:c)  “  P  +  ~  (1-P~P^)  +  6  (14) 

Equation  (14)  requires  a  non-linear  estimation  technique  which  was  developed 
in  the  following  way:  let  S(f:c)  be  the  actual  response  of  the  subject 

A 

and  s(f:c)  be  the  predicted  response  from  the  model.  We  wish  to  minimize 
some  loss  function  (we  chose  the  mean  absolute  deviation,  MAD),  by  finding 
values  of  0  and  {5  such  that. 


l|s(f:c)  -  S(f:c)| 


minimum 


(15) 


This  was  done  by  setting  up  a  grid  of  values  of  6  and  g  and  writing  a 
computer  program  to  first  compute  the  MAD  for  pairs  of  "coarse"  values  of  0 
and  g.  Since  certain  ranges  of  0  and  g  can  thus  be  excluded,  the  program 
then  considers  "finer-grained"  values  until  MAD  is  minimized. 2  The  output 
from  this  analysis  is  a  unique  set  of  values  for  0  and  g  that  minimizes 
the  desired  loss  function. 

Since  the  sampling  distributions  of  0  and  g  are  not  known,  testing 
the  statistical  significance  of  the  model’s  fit  to  the  data  is  problematic. 

A 

We  therefore  adopted  the  strategy  of  comparing  the  accuracy  of  S(f:c)  with 
that  of  a  model  based  solely  on  p.  Moreover,  since  p  is  the  anchor  of  the 

A 

assumed  process,  any  difference  between  the  accuracy  of  p  and  S(f:c)  can 
be  attributed  to  the  adjustment  process,  and  thus  to  0  and  g.  We  emphasize 
that  this  procedure  is  biased  against  finding  differences  between  p  and 

A 

S(f : c)  for  two  reasons:  (a)  the  model  predicts  that  s(f:c)  ♦  p  as  n 
increases.  Thus,  since  we  have  included  some  large  values  of  n  to  test  this 
prediction,  if  S(f:c)  »  p,  this  counts  against,  rather  than  for,  the  model; 
(b)  the  model  further  predicts  that  S(f:c)  *  p  at  the  cross-over  point, 

Pc,  and  will  be  close  to  p  in  the  region  of  pc.  Again,  if  this  occurs'T  it 
counts  against  the  model.  We  take  this  highly  conservative  approach  to  guard 
against  attributing  random  error  in  the  data  to  an  adjustment  process. 

Results 

Before  discussing  the  major  results,  recall  that  for  each  subject,  8 
stimuli  were  given  twice  so  that  test-retest  reliability  could  be  assessed. 
This  was  done  in  two  ways:  (1 )  the  correlation  between  judgments  of  the  same 
stimuli,  within  each  subject  (N  ■  8),  was  computed.  The  mean  of  these  cor¬ 
relations  was  .93,  with  26  of  the  32  coefficients  greater  than  .90;  (2)  each 
subject  was  considered  a  replication  with  8  responses  and  the  correlation 


between  judgments  for  identical  stimuli,  over  subjects  (N  =  256  =  32  subjects 
x  8  responses),  was  .91.  Clearly,  the  reliability  of  the  judgments  was  high, 
regardless  of  the  computational  method. 

For  a  general  impression  of  how  well  the  model  fits  the  data,  we  first 
consider  an  aggregate  analysis  (individual  differences  will  be  considered  in 
detail  below).  For  each  of  the  48  stimuli,  the  judgments  from  the  32  subjects 
were  averaged  to  form  the  mean  judged  likelihood,  S(f:c).  This  was  then  used 
as  the  dependent  variable  to  be  fit  by  the  model.  The  parameter  values 

A  A 

obtained  from  the  estimation  program  were,  0  ■  .35,  $  =  .10  (implying 

A 

p  *  .16).  In  addition,  the  mean  absolute  deviation  of  model  and  data  was 
c 

.020,  which  is  significantly  lower  than  that  of  the  baseline  p-model  (MAD  * 
.041;  p  <  .001  using  a  Wilcoxon  matched-pairs  signed-ranks  test). 

To  see  whether  the  implications  of  the  model  hold,  consider  Table  2, 

o,  * 

which  shows  S(f:c)  and  S(f:c)  for  the  48  stimuli.  First,  does 

l3§SE£_T§l?i«_2_§kout_here 

S ( f : c )  ♦  p  as  n  increases?  The  data  strongly  support  this  when  p  «  1, 

.67,  .60,  .50,  .40,  and  0.  At  the  values  of  .75  and  .25,  n  was  not  varied 
although  the  large  adjustments  do  suggest  that  the  expected  effect  would 
occur.  However,  the  effect  of  n  is  less  clear  at  p  ■  .89,  .80,  and  .33 
since  there  is  little  initial  adjustment  at  small  n.  Taken  together,  these 
results  suggest  moderate  support  for  the  hypothesis.  Second,  do  p  and 
n  trade-off  in  affecting  judged  likelihood?  The  evidence  here  is  quite 
convincing:  e.g. ,  note  that  S(8:1)  ■  .88  >  S(2:0)  *  .85,  S(10:5)  *  .65  > 

S(3: 1 )  «  .63,  S(1:4)  »  .21  >  S(1:3)  *  .20.  Of  particular  interest  is  the 
result  that  S(0:2)  *  .16  >  S ( 1 : 8 )  -  .12.  This  means  that  when  there  is 
limited  evidence,  no  data  in  favor  of  a  hypothesis  can  be  judged  as  stronger 
evidence  for  that  hypothesis  than  when  more  evidence  is  available  with  mixed 
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support.  Third,  an  important  implication  of  the  model  concerns  the  relation 
between  6,  8,  and  the  additivity  of  complementary  probabilities.  Recall 
from  equation  (9)  that  when  0  >  0  and  8  <  1,  S(f:c)  +  S(c:f)  <  1,  for 
0  <  p  <  1.  To  see  if  sub-additivity  exists  in  the  data  and  is  predictable 
from  the  model,  consider  Table  3,  which  shows  both  S(f:c)  +  S(c:f)  and 

A  A 

S(f:c)  +  S(c: f ) .  Note  that  there  is  substantial  sub-additivity  and  the 

jD§SE£_55kle_3_about_here 

model  does  a  reasonably  good  job  of  capturing  it.  In  judging  the  performance 
of  the  model  in  this  regard,  it  is  useful  to  remember  that  we  have  gone  beyond 
the  qualitative  prediction  that  sub-additivity  will  be  present  in  the  data,  to 
specifying  both  the  amount  of  the  effect  and  the  conditions  under  which  it 
will  not  be  present.  Given  these  goals,  we  view  the  results  as  supporting  our 
model. 

Individual  Analyses 

Since  each  subject  rated  all  stimuli,  we  can  fit  the  model  for  each 
person.  These  results  are  shown  in  Table  4.  It  is  clear  from  the  table  that 

there  are  substantial  individual  differences  in  the  parameter  values  and  the 
degree  to  which  the  model  fits  the  data  (as  indicated  by  the  MAO's).  when 
compared  with  the  aggregate  analysis,  note  that  the  individual  models  contain 
considerably  more  noise  (recall  that  the  MAD  for  the  aggregate  data  is  .020). 
Furthermore,  in  conparing  each  subject's  model  against  the  baseline  p-model, 

14  of  the  32  subjects  showed  no  significant  adjustment  process,  as  specified 
by  our  model,  while  18  did.  The  reason  for  the  emphasis  is  that  no  subject, 

A 

even  those  for  whom  0-0,  used  a  strict  p-strategy  (i.e.,  S(f:c)  ■  p  for 
all  p  and  n).  Instead,  some  used  p  most  of  the  time  but  occasionally 


TABLE  3 


E. 

(1  -  p) 

Sub -additivity 

n_ 

for  the  Aggregate  Data 

Actual 

5  (f:c)  +  I  (c:f ) 

n  n 

Predicted 

A  A 

S  (f : c)  +  S  (c : f ) 
n  n 

1 

0 

2 

1 .01 

1 .00 

1 

0 

6 

.99 

1 .00 

1 

0 

12 

1 .01 

1 .00 

1 

0 

20 

.99 

1.00 

.89 

.11 

9 

1.00 

.97 

.89 

.11 

18 

1 .00 

.98 

(.89) 

(.11) 

(18) 

(.98) 

.98 

.80 

.20 

5 

1 .01 

.95 

(.80) 

(.20) 

(5) 

.94 

.95 

.80 

.20 

10 

.98 

.97 

.75 

.25 

4 

.83 

.93 

.67 

.33 

6 

.92 

.95 

(.67) 

(.33) 

(6) 

(.92) 

.95 

.67 

.33 

9 

.88 

.97 

.67 

.33 

18 

.92 

.99 

.60 

.40 

5 

.89 

.94 

.60 

.40 

10 

.97 

.97 

.50 

.50 

2 

.90 

.84 

.50 

.50 

8 

.88 

.96 

(.50) 

(.50) 

(8) 

(.94) 

.96 

.50 

.50 

12 

.95 

.98 

.50 

.50 

20 

.94 

.98 

Note:  Numbers  in  parentheses  are  for  the  repeat  judgments 
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TABLE  4 


Pit  of  the  Model  for  Individual  Subjects 


Ss 

e 

6 

Pc 

MAO 

1 

.00 

— 

— 

.051 

ns 

2 

.00 

- 

- 

.062 

ns 

3 

.02 

.01 

.03 

.002 

ns 

4 

.02 

.14 

.20 

.025 

ns 

5 

.02 

.20 

.24 

.040 

ns 

6 

.02 

.23 

.27 

.113 

ns 

7 

.05 

.00 

.007 

ns 

8 

.10 

1.00 

.50 

.052 

ns 

9 

.11 

.11 

.17 

.025 

** 

10 

.13 

.06 

.037 

* 

11 

.15 

.00 

.00 

.081 

* 

12 

.17 

.04 

.09 

.069 

** 

13 

.24 

.01 

.03 

.051 

ns 

14 

.24 

.21 

.25 

.031 

ns 

^  15 

.28 

10.90 

.84 

.051 

*** 

16 

.30 

60.00 

.95 

.052 

ns 

17 

.36 

.01 

.052 

*** 

18 

.36 

1.00 

.50 

.030 

** 

19 

.37 

.02 

.06 

.077 

** 

20 

.37 

.08 

.14 

.033 

*** 

21 

.37 

.12 

.18 

.010 

ns 

22 

.42 

.04 

.09 

.079 

ns 

23 

.42 

.14 

.20 

.057 

ns 

24 

.44 

.06 

.12 

.027 

*** 

25 

.48 

.02 

.06 

.088 

** 

26 

.50 

.01 

.03 

.023 

*** 

27 

.55 

.02 

.06 

.046 

*** 

28 

.64 

.11 

.17 

.053 

*** 

29 

.84 

1.50 

.57 

.070 

* 

30 

.93 

.89 

.48 

.069 

*** 

31 

1.34 

.01 

.03 

.089 

*** 

32 

1.83 

.03 

.08 

.106 

*** 

# 

p  < 

.05  (Wilcoxon  test) 

** 

p  < 

.01 

*** 

p  < 

.001 

ns 

not 

significant 

4 

4 
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adjusted  for  n  at  p  *  0  and  1,  while  others  had  no  clearly  discernible 

A 

strategy.  This  helps  to  explain  why  the  MAD  for  subjects  with  Q  <,  .10  is 
not  close  to  zero,  as  would  be  expected  if  they  simply  used  p  for  making 
their  judgments.  Indeed,  subject  6  (0  =  .02)  had  the  highest  MAD  of  the  32 
subjects.  Thus,  there  seem  to  be  idiosyncratic  ways  of  making  probability 
judgments  that  are  not  captured  by  equation  (8). 

The  above  should  not  detract  from  the  fact  that  a  majority  of  subjects 
did  show  a  significant  adjustment  in  accord  with  the  theory.  We  illustrate 
this  by  the  results  of  five  subjects,  each  of  which  represents  a  different 
combination  of  0  and  8  parameters.  This  is  shown  in  Table  5.  Subject 

26  illustrates  the  use  of  a  highly  consistent  strategy  in  which  downward 
adjustments  are  made  over  almost  the  entire  range  of  p.  Subject  18  also  has 

A 

a  consistent  strategy  involving  adjustments,  but  p  »  .50,  implying  that 

c 

adjustments  will  be  down  when  p  >  .5,  up  when  p  <  .5,  and  no  adjustments 
at  p  -  .50.  The  data  conform  quite  closely  to  this  pattern.  Subject  15  has 
a  somewhat  less  consistent  strategy  of  making  small  upward  adjustments  over 

A 

most  of  the  range  of  p  (pc  -  .84).  Again,  the  data  are  generally  consistent 
with  this  interpretation.  Subject  3  is  included  for  contrast  since,  as  can 
be  seen,  there  was  almost  total  reliance  on  p  (as  would  be  predicted  by  the 
parameter  values  and  low  MAD).  Subject  32  is  shown  to  illustrate  the  most 
extreme  and  least  consistent  adjustment  process  (which  was  generally 
downward) .  As  is  evident  from  the  data,  this  subject  had  difficulty  in 
"controlling"  the  adjustment  process  (cf.  Hammond  &  Summers,  1972,  on 
"cognitive  control").  This  lack  of  consistency  manifested  itself  in  widely 
different  adjustments  for  the  same  stimuli  as  well  as  illogical  judgments.  An 
example  of  the  latter  was  that  evidence  of  (0:2)  was  evaluated  as  stronger 
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than  (2:0)  (i.e.,  .40  vs.  .30).  The  lack  of  consistency  and  large  amount  of 
adjusting  that  characterize  subject  32  suggested  that  there  might  be  a 
positive  relation  between  the  size  of  Q  and  MAD,  over  subjects.  When  we 
investigated  this,  the  correlation  was  r  =  .46  (p  <  .001).  Thus,  there  seems 
to  be  a  connection  between  the  amount  of  one's  adjustment  and  the  ability  to 
execute  it  consistently. 

Our  final  results  concern  the  additivity/non -additivity  of  complementary 
probabilities  for  individual  subjects.  To  illustrate  this,  we  use  the 
subjects  discussed  above  whose  data  are  displayed  in  Table  6.  The  important 

Insert  Table_6_abou t_here 

thing  to  note  is  that  subject  26  is  consistently  sub-additive  (and  this  is 
predicted  quite  well  by  the  model);  subject  18  is  generally  additive,  as 

A 

implied  by  p  -  .50;  subject  15  is  super-additive,  but  not  consistently  so; 
c 

subject  3  is  additive;  subject  32  is  both  highly  sub-additive  and  inconsis¬ 
tent.  From  our  perspective,  these  results  strengthen  our  interpretation  of 
the  6  and  3  parameters,  as  well  as  the  general  form  of  the  model. 

Experiment  2 

The  purpose  of  this  experiment  was  to  test  the  model  using  different 
scenarios.  However,  it  is  not  clear  that  changing  the  content  of  a  scenario 
would  leave  the  credibility  of  the  source  unchanged.  Therefore,  rather  than 
trying  to  match  the  perceived  accuracy  of  the  sources  in  the  new  scenarios  to 
the  source  in  the  car  accident  story,  we  tried  to  vary  the  credibility  of  the 
reporting  source  and  the  dissimilarity  of  the  competing  signals.  The  follow¬ 
ing  scenarios  were  used:  (1)  A  taste  test  where  people  had  to  identify  a  soft 
drink.  (Coke  vs.  Pepsi);  (2)  A  bank  robbery  where  witnesses  said  the  robbers 
spoke  to  each  other  in  a  foreign  language  (German  vs.  Italian);  (3)  An  experi- 
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ment  where  6  year  old  children  had  to  identify  words  flashed  on  a  screen  (ROT 
vs.  BED);  and,  (4)  Experts  investigating  the  cause  of  a  fire  (arson  vs.  short- 
circuit).  The  scenarios  vary  in  the  degree  to  which  one  expects  the  reporting 
source  to  be  accurate — the  least  accuracy  occurring  in  scenario  (1)  and  the 
most  in  (4).  Scenarios  (2)  and  (3)  are  intermediate  in  this  regard  since  the 
sensitivity  of  the  source  in  the  circumstances  is  questionable  although  the 
competing  signals  are  dissimilar.  While  the  above  manipulation  is  useful  for 
exploring  the  generality  of  the.  model  across  different  content,  a  more 
systematic  experimental  manipulation  of  credibility  and  dissimilarity  will  be 
discussed  in  experiment  3. 

Subjects  and  Procedures 

Thirty-two  additional  subjects  participated  in  this  experiment.  Eight 
subjects  were  randomly  assigned  to  each  scenario  condition  and  they  judged  the 
likelihood  that  one  or  other  position  was  true.  There  were  48  stimuli  as  in 
experiment  1,  and  all  other  procedures  were  identical. 

Results 

We  consider  the  fit  of  the  model  to  the  aggregate  data  in  each  scenario 
(i.e.,  the  dependent  variable  is  S(f:c)).  These  results  are  shown  in  Table 
7.  The  basic  finding  is  that  the  model  fits  these  data  quite  well.  Moreover, 

Insert_Table_7_about_here 

the  results  for  additivity  are  exactly  what  one  would  expect  on  the  basis  of 

A  A 

the  8  and  8  parameters.  The  most  interesting  finding  (and  one  that  we 

A  A 

explore  in  the  next  experiment),  concerns  the  differences  in  0  and  8 

A 

across  scenarios.  Consider  the  "taste-test"  scenario  first  and  note  that  0 
is  high  and  the  cross-over  point  is  .51.  This  means  that  subjects  were 
adjusting  their  responses  toward  .50,  which  makes  sense  in  this  situation. 
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TABLE  7 


Pit  of  the  Model  in  Four  Scenarios 


\ 

Taste-test 

Bank  Robbery 

Word  Recognition 

Fire 

A 

e 

.75 

.35 

.35 

.25 

A 

0 

1.10 

.00 

.30 

1.40 

A 

pc 

.51 

.00 

.30 

.55 

026 


036 


026 


025 
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That  is,  when  the  stimuli  are  highly  similar  and  the  source  non-expert,  one 
expects  data  that  do  not  discriminate  between  hypotheses.  Thus,  when  subjects 
see  results  that  are  discrepant  from  .50,  they  "regress"  their  judgments 
toward  .50.  Now  consider  the  "bank  robbery"  and  "word  recognition"  scenarios. 
Here,  the  values  are  lower  and  the  cross-over  points  imply  that  the  data  tend 
to  be  adjusted  downward  over  most  of  the  range  of  p.  Therefore,  these 
scenarios  seem  to  engender  a  more  "conservative"  strategy  than  the  taste- 
test.  Finally,  the  results  for  the  "fire"  scenario  show  the  lowest  value  of 

A 

0.  This  is  consistent  with  the  view  that  the  source  in  this  scenario  consists 
of  experts  and  should  therefore  be  adjusted  least. 

Experiment  3 

We  had  two  goals  in  conducting  experiment  3.  First,  we  wished  to  test 
systematically  for  the  effects  of  source  credibility  and  signal  dissimilarity 
on  the  parameters  of  the  model.  In  accordance  with  our  theory,  0  should 
decrease  as  source  credibility  and  signal  dissimilarity  increase.  In  addi¬ 
tion,  we  hypothesized  that  g  (and  thus  pc)  would  decrease  as  ambiguity 
increased;  that  is,  we  expected  attitudes  toward  ambiguity  to  become  more 
conservative  with  increases  in  ambiguity.  Second,  we  wished  to  investigate 
the  importance  of  individual  differences  in  the  way  people  cope  with  the 
ambiguity  inherent  in  our  judgment  task. 


METHOD 


Design.  Two  levels  (high/low)  of  source  credibility  and  dissimilarity  of 
signals  were  crossed  in  a  2  x  2  factorial  design.  In  addition,  four  different 
content  scenarios  were  constructed  that  varied  on  all  four  experimental 
combinations  (resulting  in  16  different  stories).  Subjects  were  asked  to 
judge  21  stimuli  that  varied  in  p  and  n  (see  below)  for  each  of  the  four 
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content-distinct  scenarios.  Thus,  each  subject  initially  made  84  probability 
judgments.  However,  in  order  to  reduce  boredom  in  the  task,  subjects  made 
judgments  in  all  four  scenarios,  with  each  scenario  representing  one  of  the 
four  esqperiment  conditions.  For  example,  subject  1  received  scenario  A  in  the 
high/high  condition,  scenario  B  in  the  high/low  condition,  and  so  on.  A  four- 
person  latin-sguare  was  set  up  so  that  every  scenario  appeared  an  equal  number 
of  times  in  each  experimental  condition.  Finally,  since  subjects  made 
judgments  in  one  scenario  under  the  high/high  condition,  the  same  scenario  was 
also  given  in  the  low/low  condition  (and  the  order  was  counter-balanced).  In 
this  way,  we  were  able  to  examine  each  subject's  judgments  holding  the  content 
of  the  scenario  constant.  This  part  of  the  experiment  required  21  additional 
judgments,  making  the  total  number  of  responses  for  each  subject  equal  to  105. 

Stimuli.  The  four  content  scenarios  used  involved  the  car  accident  from 
experiment  1,  the  word-recognition  task  from  experiment  2  and  two  new  stories. 
These  latter  scenarios  involved  determining  the  name  of  a  play  from  an  excerpt 
and,  the  diagnosis  of  a  medical  condition.  Four  versions  of  each  scenario 
were  constructed  to  reflect  different  levels  of  credibility  and  dissimilarity 
(e.g. ,  in  the  word-recognition  task,  we  had  15  vs.  6  year  olds  and  BCD  vs.  ROT 
as  opposed  to  BED  vs.  BID).  Within  each  scenario,  subjects  were  given  21 
stimuli  that  reflected  the  amount  of  evidence  for  each  hypothesis.  The  values 
of  the  stimuli  were  different  from  those  used  in  experiments  1  and  2  in  that 
smaller  values  of  n  were  used  in  order  to  provide  more  sensitive  tests  of 
the  model.  The  stimuli  used  were:  for  p  ■  0,  1,  n  *  1,  2,  6;  for  p  *  .125, 
.875,  n  ■  8}  for  p  ■  .2,  .8,  n  ■  5;  for  p  **  .25,  .75,  n  *  4;  for  p  *  .33,  .67, 
n  ■  6,  9;  for  p  *  .67,  n  ”  3;  for  p  ■  .4,  .6,  n  ■  5;  for  p  ■  .5,  n  ■  2,  8. 


Subjects  and  Procedures.  Thirty-two  subjects  participated  in  this 
experiment  (comprising  8,  4-person  la tin-squares) .  Subjects  were  paid  $5  per 
hour  and  the  task  took  about  one  hour  to  complete.  The  tasks  were  presented 
in  booklets  and  after  each  series  of  21  judgments,  subjects  were  either  given 
a  break  or  another  task.  At  the  end  of  the  experiment,  a  manipulation  check 
was  performed  on  the  credibility  and  dissimilarity  induction.  Specifically, 
each  subject  was  asked  to  rate  (using  a  0-100  scale)  the  credibility  of  the 
source  and  the  confusability  of  the  signals  in  all  four  scenarios.  Since  each 
scenario  had  high  and  low  levels  of  each  factor,  the  subjects  rated 
credibility  and  dissimilarity  under  both  conditions.  Therefore,  subjects  made 
4  judgments  on  each  of  the  4  scenarios. 

Results 

Before  presenting  the  main  results,  we  note  that  the  manipulation  check 
showed  that  subjects  did,  on  average,  see  the  "high"  credibility  versions  of 
the  same  scenarios  as  greater  than  the  low  (80  vs.  47);  and  the  high  dis¬ 
similarity  signals  as  less  confusable  than  the  low  (30  vs.  62).  However,  it 
should  be  noted  that,  in  absolute  terms,  the  low  credibility/low  dissimilarity 
conditions  were  not  extreme.  Therefore,  contrary  to  our  intentions,  we  failed 
to  induce  a  situation  of  low  ambiguity  in  the  low/low  manipulation  similar  to 
the  taste-test  scenario  of  experiment  2.  (Recall  that  the  latter  could  be 
considered  low  in  ambiguity  since  subjects  essentially  treated  the  reports  as 
emanating  from  a  random  process  with  p  »  .5.) 

(1)  General  fit  of  the  model:  For  each  subject  in  each  experimental 
condition,  the  model  was  fit  to  yield  estimates  of  0  and  £  (this  resulted 
in  160  models  -  32  subjects  x  5  models).  The  fit  of  the  individual  models 
was  con$>arable  to  that  of  experiments  1  and  2  (median  MAD  *  .042  over  all 
conditions). 
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(2)  Manipulation  of  0  and  g:  We  first  consider  the  effects  of  the 
experimental  manipulation  on  the  estimated  0  parameters.  The  appropriate 

A 

analysis-of-variance  (2  x  2  x  latin-square)  was  performed  using  0  as  the 
dependent  variable  and  the  results  showed  a  significant  main  effect  for 
"credibility"  (p  <  .001),  no  main  effect  for  "dissimilarity,"  and  a  three-way 
interaction  of  scenario  x  credibility  x  dissimilarity  (p  <  .02).  The  results 
for  the  main  effects  are  shown  in  Table  8.  The  table  shows  that  0  does 

increase  as  the  credibility  of  the  source  decreases,  thereby  confirming  our 
prediction.  However,  there  was  no  effect  for  dissimilarity,  contrary  to  our 
prediction.  The  three-way  interaction  showed  that  in  two  scenarios,  the 
effect  of  dissimilarity  of  the  signals  had  a  large  effect  on  0  when 
credibility  was  low,  while  in  the  other  two  scenarios,  dissimilarity  had  a 
large  effect  when  credibility  was  high.  However,  it  is  not  clear  why  this 
occurred  and  we  do  not  consider  it  further. 

In  addition  to  the  above  analysis,  recall  that  each  subject  also  received 
the  same  scenario  in  the  high/high  and  low/low  conditions.  A  comparison  of 
the  means  of  the  estimated  6's  in  these  two  conditions  also  showed  a 
significant  difference  in  the  hypothesized  direction;  i.e.,  S  «  .17  in  the 
high/high  condition,  §  =«  .29  in  the  low/lcw  (p  <  .004  by  a  paired  t-test). 
Thus,  with  the  exception  of  an  effect  for  the  dissimilarity  of  the  signals, 
our  hypotheses  concerning  0  are  supported  by  the  experimental  data. 

We  new  turn  to  the  results  for  the  8  parameter.  However,  since  8  is 
highly  skewed,  we  substitute  its  corresponding  pc  value  in  the  analyses. 
First,  the  analysis-of-variance  using  pc  as  the  dependent  variable  only 


showed  a  significant  main  effect  for  credibility;  i.e.,  p^  -  .23  for  low 


credibility,  p  *  .32  for  high  (p  <  .002).  In  addition,  when  subjects 
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judged  the  same  story  in  the  high/high  and  low/low  conditions,  the  analysis 

showed  the  same  effect;  viz.,  p  *  .36  in  the  high/high  condition, 

c 

p  *  .25  in  the  low/low  (p  <  .05,  paired  t-test).  Therefore,  as  the 
c 

credibility  of  the  source  increased,  the  cross-over  point  in  the  model  also 
increased. 

(3)  Individual  differences:  We  now  consider  the  following:  (a)  can 
subjects  be  characterized  as  having  a  general  strategy,  as  measured  by  the 
consistency  of  their  9  and  &  values,  in  different  scenarios?;  (b)  is  the 
amount  of  one's  adjustment,  as  measured  by  9,  systematically  related  to  the 
consistency  of  executing  one's  strategy?;  (c)  can  individual  perceptions  of 
the  credibility  of  the  source  and  the  dissimilarity  of  the  signals  account  for 
variance  in  9  and  0  within  each  of  the  experimental  conditions? 

(a)  Recall  that  for  each  subject,  four  different  scenarios  were  given  and 

a  model  fit  to  the  data  in  each.  Therefore,  each  subject  can  be  characterized 
by  four  9's,  0's,  and  MAD's.  To  determine  if  the  parameter  values  were 

more  alike  within  a  subject  than  between  subjects  (this  is  measured  by  the 
intra-class  correlation),  a  one-way  repeated  analysis-of-variance  was  per- 

A  A 

formed  (32  x  4)  for  9,  p  ,  and  MAD  (Winer,  1963,  chap.  3).  The  results 

c 

A  A 

showed  that  for  9,  r  ■  .73  (p  <  .001);  for  pc,  r  ”  .68  (p  <  .001);  and 
for  MAD,  r  “  .86  (p  <  .001).  These  results  are  particularly  striking  when  it 
is  realized  that  the  four  scenarios  varied  over  the  four  experimental 
conditions.  In  spite  of  this,  the  results  show  strong  and  stable  individual 
strategies  in  the  amount  that  is  adjusted  (9),  the  direction  of  the  adjust¬ 
ments  (p  or  0),  and  the  consistency  of  executing  one's  strategy  (MAD), 
c 

(b)  In  both  experiments  1  and  2,  we  found  a  significant  positive  cor- 

A 

relation  between  0  and  MAD.  We  examined  this  in  experiment  3  and  found  the 
same  positive  relation  in  three  of  the  four  scenarios  (r  ■  .67,  .48,  .40, 


.10).  Thus,  our  interpretation  of  0  as  reflecting  a  cognitive  simulation 
process  is  strengthened  by  the  generality  of  this  finding. 

(c)  Since  each  subject  made  independent  judgments  of  the  credibility  and 
confusability  of  the  experimental  stimuli,  we  were  also  able  to  investigate 
how  these  judgments  related  to  the  6  and  0  parameters  within  experimental 
conditions.  To  do  so,  we  first  re-analyzed  our  data  with  a  regression  model 
where  3  was  the  dependent  variable,  and  the  individual  ratings  of 
credibility  and  confusability,  together  with  dummy  variables  representing  the 
different  scenarios,  were  the  independent  variables.  More  precisely,  there  is 
a  regression  equation  of  this  type  for  each  of  the  four  ejqserimental 
conditions.  However,  these  four  equations  can  be  estimated  more  efficiently 
as  a  single  model  using  Zellner's  (1962)  procedure  for  "seemingly  unrelated" 
regressions.  The  multiple  R  estimated  by  this  procedure  was  .44  (with  an 
adjusted  R  of  .35).  Of  the  independent  variables,  there  was  no  effect  for 
either  scenarios  or  confusability.  However,  all  four  coefficients  for 
credibility  in  the  different  experimental  conditions  were  significant 
(p  <  .02)  and  of  the  hypothesized  sign  (i.e.,  a  negative  relation  between 
3  and  ratings  of  credibility).  When  the  same  regression  technique  was  used 
with  pc  as  the  dependent  variable,  similar  results  were  obtained.  We 
interpret  these  results  as  strengthening  the  conclusions  drawn  from  the  more 
standard  ANOVA  of  our  study;  that  is,  @  and  pc  are  not  only  affected  by 
different  levels  of  credibility  across  all  subjects,  they  also  covary 
significantly  with  individual  perceptions  of  credibility  within  each  of  these 
levels. 

Experiment  4 

The  purpose  of  this  experiment  was  to  answer  the  following  question:  Can 
individuals'  choices  between  gambles  be  predicted  from  knowledge  of  their  3 


and  0  parameters  obtained  from  a  separate  inference  task?  To  examine  this, 
subjects  were  first  asked  to  make  judgments  as  in  experiments  1-3  and  both 
9  and  0  were  estimated  as  before.  The  subjects  were  then  asked  to  choose 
(or  express  indifference)  between  9  pairs  of  gambles  involving  the  outcome 
from  an  urn  with  known  p,  versus  the  occurrence  of  an  event  on  the  basis  of 
unreliable  reports.  If  9  and  0  are  capturing  aspects  of  ambiguity  that 
affect  choice,  knowledge  of  these  parameters  should  allow  one  to  predict 
individual  choices  in  addition  to  inferences. 

Subjects.  Twenty  subjects,  recruited  from  the  University  of  Chicago 
community,  participated  in  this  study.  They  were  paid  $5/hour. 

Stimuli.  For  the  inference  task,  two  different  scenarios  were  used: 
the  car-accident  story  from  experiment  1,  and,  the  taste -test  story  from 
experiment  2.  These  were  chosen  because  the  9  and  0  values  were  quite 
different  in  the  two  cases.  In  both  scenarios,  subjects  received  40  combina¬ 
tions  of  p  and  n  that  were  identical  to  those  used  in  the  previous 
experiments.  The  stimuli  for  the  choice  task  involved  one  of  the  following: 

(a)  For  those  in  the  car-accident  task,  a  gambling  situation  was  set-up 
involving  the  choice  between  betting  that  a  ball  drawn  from  an  urn  with  known 
p  was  green,  versus,  betting  that  the  car  that  caused  the  accident  was  green 
based  on  witnesses'  reports  of  the  car  color;  (b)  For  those  in  the  taste-test 
scenario,  the  choice  was  similarly  between  betting  that  the  outcome  from  an  urn 
was  a  certain  color,  versus,  betting  that  the  drink  was  Pepsi-Cola.  In  both 
scenarios,  subjects  were  told  to  imagine  that  their  payoff  for  being  correct 
would  be  $10.  Thus,  the  payoffs  for  the  urn  gamble  and  the  bet  involving  the 
report  of  some  event  were  equal,  within  scenarios,  each  subject  saw  9  pairs  of 
gambles  that  varied  in  the  proportion  of  colored  balls  in  the  urn  and  the 
proportion  of  reports  favoring  the  particular  hypothesis.  These  proportions 
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were  always  the  same  in  the  two  bets.  The  exact  values  of  p  used  in  the  9 
pairs  were:  1,  .875,  .75,  .625,  .50,  .375,  .25,  .125,  and  0.  The  number  of 
balls  in  the  urn  and  the  number  of  reports  were  held  constant  at  8. 

Procedure.  The  twenty  subjects  were  randomly  assigned  to  one  of  the  two 
scenarios.  The  procedure  for  the  inference  task  was  identical  to  the  previous 
experiments.  After  subjects  finished  the  inference  task,  they  were  presented 
with  the  appropriate  choice  task.  The  nature  of  the  two  gambles  was 
explained,  and  subjects  were  then  asked  to  choose,  or  indicate  indifference, 
between  the  gambles.  If  they  were  not  indifferent,  they  were  also  asked  to 
indicate  their  strength  of  preference  on  a  4-point  scale  (from  "little"  to 
"great  deal").  After  doing  this  for  one  value  of  p,  they  turned  the  page 
and  made  another  choice  (and  strength  of  preference  rating,  if  appropriate)  at 
the  next  level  of  p.  This  continued  until  all  9  pairs  had  been  considered. 
Therefore,  for  each  subject,  there  were  9  choices  between  an  unambiguous  bet 
from  an  urn  with  known  p,  versus  an  ambiguous  bet  that  an  event  occurred,  on 
the  basis  of  the  proportion  of  favorable  reports  from  an  unreliable  source. 

Results.  Since  each  subject  first  participated  in  the  inference  task,  we 
briefly  consider  these  results  before  discussing  the  choice  data.  As 
expected,  there  were  marked  differences  in  the  0  and  6  parameters  in  the 
two  scenarios.  The  medians  for  0  and  pc  (implied  by  g)  were  .13  and 
.11,  respectively,  in  the  car-accident  scenario.  For  the  taste-test,  the 
median  Q  was  1.35  and  median  pc  =  .45.  Thus,  the  taste-test  scenario 
induced  much  adjustment,  with  a  cross-over  point  near  .50,  while  the  car- 
accident  story  induced  less  adjustment  but  a  lower  cross-over  point. 

To  compare  each  subject's  choices  with  predictions  from  the  inference 
model,  the  following  procedure  was  used:  any  combination  of  0  and  pc 
implies  when  and  where  S(Pft)  Is  greater,  less  than,  or  equal  to,  P^  ^see 
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equation  (13)).  Thus,  for  each  subject,  when  PA  >  S(PA),  we  predicted  the 
urn  would  be  chosen  over  the  bet  based  on  unreliable  reports;  when  S(PA)  > 

PA,  the  opposite  prediction  was  made;  when  S(PA)  =  PA>  we  predicted 
indifference  between  the  two  gambles.  Note  that  when  6=0,  we  always 
predicted  indifference  between  the  gambles  since  S(PA)  =  PA  for  all  PA* 

A  A 

In  Table  9,  we  show  the  0  and  p  values  for  each  subject  (grouped  by 

c 

Insert  Table_9  about  here 

scenario),  and  the  number  of  correct  choice  predictions  by  subject. 

A 

To  evaluate  how  well  the  choices  were  predicted  from  knowledge  of  0 

A 

and  p  ,  we  used  a  random  baseline  for  comparison;  i.e.,  for  each  of  the  9 
c 

choices  made  by  a  subject,  there  were  three  possible  outcomes;  urn,  report,  or 
indifference.  Since  the  probability  of  randomly  predicting  the  correct 
response  is  1/3,  we  confuted  the  probability  of  getting  at  least  r  hits  in  9 
trials  on  the  basis  qf  chance  (using  the  binomial  distribution).  This  prob¬ 
ability  is  shown  in  the  last  column  of  Table  9.  For  example,  subject  1  was 
correctly  predicted  in  8  of  the  9  choices;  the  probability  of  getting  at  least 
this  many  hits  by  chance  is  .001 .  Thus,  we  rejected  the  hypothesis  that  our 
predictions  for  this  subject  were  no  better  than  chance.  Using  this  method 
for  all  subjects,  it  can  be  seen  that  5  of  the  10  subjects  in  the  car-accident 
scenario,  and  4  of  10  in  the  taste-test,  are  well  predicted  using  a  type  I 
error  level  of  .05.  If  this  error  level  were  increased  to  .15,  a  majority  of 
subjects  (12/20)  would  be  accurately  predicted  from  their  inference  para¬ 
meters.  In  any  event,  at  the  aggregate  level  (over  subjects  and  scenarios), 
there  were  103  hits  out  of  179  predictions  (one  response  was  missing).  The 
probability  of  getting  at  least  this  many  hits  by  chance  is  miniscule. 

Our  final  results  concern  the  strength  of  preference  ratings.  Recall 
that  in  addition  to  choosing  between  gambles,  subjects  were  asked  to  rate 


their  strength  of  preference  on  a  4-point  scale.  These  ratings  supplement  our 
analysis  of  the  choice  data  in  the  following  way:  in  each  scenario,  the 
number  of  prediction  errors  was  38.  However,  in  the  taste-test,  0  is  much 
larger  than  in  the  car-accident  scenario.  Since  0  is  directly  related  to 

the  amount  of  adjustment  to  P&,  the  differences  between  S(P&)  and  PA 

should  be  larger  in  the  taste-test  than  in  the  accident  story.  Furthermore, 

the  larger  the  differences,  the  stronger  one's  preferences  should  be  since 

they  are  further  away  from  indifference  (where  P_  =  S(P  )).  We  tested  this 

“  A  A 

by  comparing  the  mean  strength-of-preference  ratings  in  the  two  stories  across 
the  nine  levels  of  p.  These  results  are  shown  in  Table  10.  First,  note  that 

Insert  Table  10_about  here 

the  means  for  the  taste-test  are  larger  than  the  car-accident  at  every  level 
of  p.  Second,  the  pattern  of  means  is  consistent  with  the  general  form  of 
the  model  in  that  preferences  are  strongest  at  p  «  1,  decrease  as  p 
approaches  pc,  and  then  increase  again  at  p  =  0.  Therefore,  the  strength- 
of-preference  data  are  consistent  with  both  the  difference  in  the  sizes  of  0 
for  the  two  scenarios,  as  well  as  the  general  form  of  the  model. 

DISCUSSION 

We  now  discuss  the  implications  of  our  theory  and  results  with  respect  to 
the  following  issues:  (1 )  the  importance  of  ambiguity  in  assessing  perceived 
uncertainty;  (2)  the  use  of  cognitive  strategies  in  probabilistic  judgments 
under  ambiguity;  (3)  the  role  of  ambiguity  in  risky  choice;  and,  (4)  exten¬ 
sions  of  the  model  to  multiple  sources  and  time  periods. 


Ambiguity  and  the  Assessment  of  Uncertainty 

The  concept  of  ambiguity  highlights  the  distinction  between  one's  lack  of 
knowledge  of  the  process  that  generates  outcomes  and  the  uncertainty  of 
outcomes  conditional  on  some  model  of  the  process.  The  fact  that  there  are  at 
least  two  sources  of  uncertainty  in  most  situations  leads  to  the  irony  that 
one  needs  a  well-defined  model  to  give  precise  estimates  of  how  much  one 
doesn't  know.  Indeed,  the  usefulness  of  formulating  well-defined  stochastic 
processes  is  in  eliminating  ambiguity  so  that  outcome  uncertainty  can  be 
quantified.  Thus,  when  coins  are  "fair"  or  random  drawings  are  taken  from 
urns  with  known  p,  there  is  no  second-order  uncertainty.  Furthermore,  the 
conditional  nature  of  uncertainty  is  implicitly  recognized  in  various  attempts 
to  quantify  and  improve  inferential  judgments.  For  example,  consider  how 
uncertainty  is  defined  in  the  "lens  model"  (Hammond,  et  al.,  1964).  In  this 
case,  the  uncertainty  in  the  environment  is  measured  as  the  residual  variance 
not  accounted  for  by  a  well-formulated  ecological  model.  Thus,  unexplained 
variance  or  uncertainty  is  conditional  on  the  model  of  how  particular  cues 
combine  to  form  the  criterion  of  interest.  How  consider  the  work  of  Nisbett 
and  colleagues  on  trying  to  improve  probabilistic  judgments  through  training 
(Nisbett,  et  al.,  undated;  Jepson,  et  al.,  in  press).  They  argue  that 
training  and  experience  can  allow  one  to  see  the  underlying  structure  of  real- 
world  problems  so  that  the  appropriate  model  can  be  used  for  making  better 
judgments.  Thus,  the  focus  of  their  training  is  on  making  various  statistical 
principles  (e.g.,  regression-to-the-mean,  law  of  large  numbers,  use  of  base 
rates,  etc.)  more  obvious  in  everyday  inferences. 

While  the  conditional  nature  of  uncertainty  has  been  implicitly 
recognized,  ambiguity  results  from  its  explicit  recognition?  i.e.,  by 
realizing  that  the  "model"  is  itself  subject  to  uncertainty.  Indeed,  one 
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could  argue  that  the  cost  of  urn  models,  coin-flipping  analogies,  and  the 
like,  is  that  they  can  obscure  the  fact  that  most  real  world  generating 
processes  are  not  precisely  known.  Furthermore,  even  if  a  process  is  well- 
defined  at  one  point  in  time,  the  parameters )  of  the  process  can  change  over 
time,  resulting  in  ambiguity  as  well  as  uncertainty.  For  example,  imagine 
that  you  have  been  asked  to  evaluate  the  research  output  of  a  younger 
colleague  being  considered  for  promotion.  Your  colleague  has  produced  1 1 
papers;  of  these  the  first  9  (in  chronological  order)  represent  competent, 
albeit  unexciting  scholarly  work.  On  the  other  hand,  the  latter  2  papers  are 
quite  different;  they  are  innovative  and  suggest  a  creativity  and  depth  of 
thought  absent  from  the  earlier  work.  What  should  you  do?  As  someone  who  is 
aware  of  regression  fallacies,  you  might  consider  the  two  outstanding  papers 
as  outliers  from  a  stable  generating  process  and  thus  predict  regression-to- 
the-mean.  Alternatively,  you  might  consider  the  outstanding  papers  as 
"extreme1*  responses  that  signal  a  change  in  the  generating  process;  i.e.,  a 
new  and  higher  mean.  If  this  were  the  case,  the  same  general  regression  model 
would  predict  future  papers  of  high  quality  (regression  to  a  higher  mean).  If 
one  asks  what  is  the  nature  of  the  signaling  in  this  case,  it  is  obvious  that 
the  chronological  order  of  the  papers  is  crucial.  Indeed,  imagine  that  the 
outstanding  papers  were  the  first  two  that  were  written;  or  consider  that  they 
were  the  second  and  sixth.  Each  of  these  cases  suggests  a  different  under¬ 
lying  model  and  perhaps  a  different  prediction.  In  any  event,  the  uncertainty 
associated  with  any  prediction  is  obviously  complicated  by  the  ambiguity 
regarding  the  appropriate  mean  of  the  regression  process. 
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Cognitive  Strategies  in  Inferences  Under  Ambiguity 

We  have  assumed  that  people  use  an  anchoring-and-adjustment  strategy  in 
making  inferences  under  ambiguity.  However,  whereas  the  term,  "anchoring-and- 
adjustment"  is  quite  general  and  could  encompass  a  wide  range  of  models  (cf. 
Lopes,  1981;  Einhorn  &  Hogarth,  in  press),  we  have  been  quite  specific  as 
to  the  nature  of  this  process  in  our  tasks.  Of  greatest  interest  in  this 
regard  is  the  idea  that  adjustments  are  based  on  a  mental  simulation  in  which 
"what  might  have  been"  is  combined  with  "what  is"  (the  anchor).  The  rationale 
for  this  comes  from  the  fact  that  the  evaluation  of  evidence  often  involves  an 
implicit  comparison  process  (similar  to  the  perception  of  figure  against 
ground).  Thus,  when  evaluating  the  strength  of  evidence  for  a  particular 
hypothesis,  the  evidence  that  might  nave  been  can  serve  as  a  convenient 
contrast  case  for  comparison.  Furthermore,  since  ambiguity  implies  that 
multiple  models  could  have  produced  the  observed  results,  it  seems  natural  to 
consider  that  different  results  could  have  occurred  on  the  basis  of  different 
underlying  processes. 

The  support  for  the  hypothesized  anchoring-and-adjustment  strategy  comes 
from  several  sources.  First,  recall  that  in  our  model,  the  largest  adjust¬ 
ments  to  the  anchor  occur  at  small  amounts  of  evidence.  Moreover,  as  n 
increases,  S(f:c)  asymptotes  at  p.  The  results  of  experiments  1-3  support 
this  prediction.  Thus,  the  weight  of  evidence  (to  use  Keynes'  term)  for  "what 
is,"  dominates  "what  might  have  been"  as  the  absolute  amount  of  evidence 
increases.  Furthermore,  the  effect  of  increasing  n  is  to  reduce  the  amount 
of  non-additivity  of  complementary  strengths.  Since  most  of  our  subjects  were 
sub-additive,  our  model  provides  a  psychological  link  to  concerns  expressed  by 
others  regarding  the  appropriateness  of  the  complementarity  of  probabilities 


based  on  small  amounts  of  evidence  (Shafer,  1976;  Cohen,  1977).  In 
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particular,  Cohen  (1977,  chap.  3)  points  out  that  when  one  considers  an 
incomplete  system,  the  lower  benchmark  on  provability  is  not  necessarily 
disprovability,  but  non -provability.  For  example,  if  one  has  a  meager  amount 
of  circumstantial  evidence  supporting  a  particular  theory  such  that  the 
probability  of  the  theory's  truth  is  .2,  does  that  imply  that  the  theory  is 
false  with  p  *  .8?  One  might  rather  say  that  the  theory  is  not  proven  (in  a 
probabilistic  sense)  as  opposed  to  saying  that  there  is  a  .80  chance  it  is 
wrong.  Furthermore,  the  idea  that  the  complement  of  statements  can  lead  to 
"not-proved"  rather  than  "disproved,"  seems  to  be  deeply  imbedded  in  the 
Anglo-American  legal  system.  Indeed,  in  Scottish  law,  defendants  are  either 
found  guilty,  not-guilty,  or  "not  proven."  The  last  category  is  reserved  for 
those  cases  where  the  evidence  is  too  meager  to  allow  for  a  judgment  of  guilt 
or  innocence. 

Second,  the  fact  that  non-additivity  results  from  a  shift  in  the 
direction  of  the  adjustment  process  is  consistent  with  other  "order  effects" 
due  to  the  use  of  anchoring-and-adjustment  strategies.  For  example,  in  a 
traditional  Bayesian  revision  task,  Lopes  (1981)  found  that  a  change  in  the  - 
order  in  which  sample  information  was  presented  affected  overall  judgments  by 
changing  the  anchor.  Thus,  consider  having  to  judge  whether  samples  come  from 
an  urn  containing  predominantly  red  or  blue  balls  (70/30  in  both  cases).  You 
first  draw  a  sample  of  8  that  shows  (5R,  3b).  Thereafter,  you  draw  another 
sample  of  8  with  the  result  (7R:1B).  After  each  sample,  you  are  asked  how 
likely  it  is  that  you  have  drawn  from  the  predominantly  red  urn.  When  the 
sample  evidence  is  in  the  order  given  here,  people  seem  to  anchor  on  the  first 
sample  (5:3)  and  then  adjust  up  for  the  second  (stronger)  sample.  However, 
when  the  order  of  the  samples  is  reversed,  people  anchor  on  (7:1)  and  adjust 
down  for  the  weaker,  second  sample.  This  effect  cannot  be  accounted  for  by 


assuming  that  people  are  using  a  Bayesian  procedure  (which  treats  the  two 
situations  as  equal),  but  it  does  follow  from  an  anchoring  and  adjustment 
process  in  which  the  anchor  is  weighted  more  heavily  than  the  adjustment. 

Third,  the  results  of  experiment  3  provide  important  evidence  regarding 
the  process  assumed  to  underlie  the  model.  In  addition  to  the  fact  that  the 
experimental  manipulation  of  source  credibility  affected  0  and  $  as 
predicted,  two  other  results  were  found;  a  positive  correlation  between  0 
and  MAD  and,  the  stability  of  individual  differences  in  parameter  values 
across  scenarios.  The  first  result  bears  directly  on  the  nature  of  the 
adjustment  process  since  it  suggests  that  there  is  a  "cost"  of  engaging  in  a 
mental  simulation;  namely,  a  concomitant  lack  of  control  over  one's  strategy 
(Hammond  &  Summers,  1972).  The  second  result  suggests  that  there  may  be 
strong  personal  propensities  in  evaluating  evidence  that  transcend  the 
particular  content  of  scenarios.  While  it  is  too  early  to  explicate  the 
nature  of  these  individual  differences,  their  existence  lends  support  to  the 
idea  that  0  and  8  are  capturing  important  aspects  of  the  process  that 
determines  judgments  under  ambiguity. 

While  our  model  accounts  for  the  rather  simple  inferences  we  have 
studied,  it  also  relates  to  an  important  class  of  inferences  that  result  from 
"surprise."  Consider  Figure  5,  which  shows  one's  expectations  for  p  as  a 
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function  of  the  credibility  of  the  source  and  the  dissimilarity  of  the  signals. 
First,  note  that  when  credibility  and  dissimilarity  are  high,  one  expects  p  to 
be  very  high  or  low  (recall  our  earlier  example  of  cameras  taking  pictures  of  a 
bank  robber).  However,  imagine  that  one  camera  showed  the  bank  robber  to  be 
white,  and  the  other  showed  him  to  be  black.  Such  a  result,  where  p  =  .5,  would 
be  surprising  given  the  credibility  of  cameras  and  the  dissimilarity  of  white  and 


black  robbers.  Indeed,  the  data  "are  not  good  enough,"  which  is  represented  by 
the  range  of  p  indicated  by  the  two-headed  arrow.  Second,  consider  the  low 
credibility-low  dissimilarity  situation;  e.g.,  the  taste-test  scenario.  Imagine 
that  you  were  told  that  of  the  20  people  in  the  Pepsi  vs.  Coke  taste-test,  all 
correctly  identified  the  drink  as  Pepsi.  Such  a  result,  where  p  =  1 ,  would 
be  surprising.  However,  this  type  of  surprise  is  one  where  the  "data  are  too 
good"  rather  than  not  good  enough.  Thus,  there  are  two  types  of  surprise  and 
both  occur  when  ambiguity  is  low. N Indeed,  when  ambiguity  is  high,  expecta¬ 
tions  are  weak  and  surprise  (which  results  from  a  violation  of  expectations) 
is  unlikely.  This  situation  characterizes  the  off-diagonal  cells  in  the 
figure  and  accounts  for  our  labeling  of  these  cells,  "little  surprise." 

Although  our  conceptual  scheme  makes  clear  when  surprise  is  likely  to 
occur,  it  can  not  handle  the  variety  of  possible  reactions  it  can  engender. 

For  example,  when  data  are  not  good  enough,  it  is  possible  to  reduce  the 
credibility  of  the  source  (e.g.,  the  cameras  were  broken),  synthesize  the 
hypotheses  (there  were  two  bank  robbers,  one  white  and  the  other  black),  or 
otherwise  make  sense  of  the  data  by  changing  the  story  (e.g.,  there  were  two 
bank  robberies  on  successive  days).  On  the  other  hand,  when  data  are  too 
good,  inferences  of  fraud,  collusion,  and  the  like,  are  possible  (see,  e.g., 
Kamin,  1974  on  Burt's  twin  data;  Bishop,  Fienberg,  S  Holland,  1975,  on 
Mendel's  pea  experiments).  An  interesting  aspect  of  such  inferences  is  that 
the  surface  meaning  of  the  data  can  suggest  the  opposite  conclusion;  e.g., 
consider  someone  who  "protesteth  too  much,"  or  a  suspect  who  was  "framed"  for 
a  crime.  Indeed,  this  is  implied  by  our  model.  Specifically,  consider  the 
case  of  totally  unreliable  data,  which  occurs  when  UR  ■  1  or  0  *  n  (see 
equation  (5)).  In  this  case. 
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Thus,  as  p  increases,  S(f:c)  decreases.  More  generally,  as  UR 
increases,  it  will  reach  a  point,  conditional  on  p  and  6,  where  the 
evidence  for  a  hypothesis  will  be  counted  against  it. 

Ambiguity  and  Risk 

Although  the  importance  of  ambiguity  for  understanding  risk  has  been 

evident  since  Ellsberg's  original  article,  its  omission  from  the  voluminous 

literature  on  risk  is  puzzling.  One  reason  for  this  omission  may  be  the 

reliance  on  the  explicit  lottery,  with  stated  payoffs  and  probabilities, 

for  representing  risky  choice.  Indeed,  as  Lopes  (1983)  has  noted. 

The  simple,  static  lottery  or  gamble  is  as  indispensable  to 
research  on  risk  as  is  the  fruitfly  to  genetics.  The  reason 
is  obvious;  lotterir j,  like  fruitflies,  provide  a  simplified 
laboratory  model  of  the  real  world,  one  that  displays  its 
essential  characteristics  while  allowing  for  the  manipulation 
and  control  of  important  e;q>erimental  variables.  (1983,  p.  137) 

It  should  be  further  noted  that  the  explicit  lottery  has  been  of  equal 

importance  to  those  interested  in  axiom  systems  and  formal  models  of  risk. 

While  explicit  lotteries  have  been,  and  continue  to  be,  useful  for  studying 

risk,  the  ambiguities  surrounding  real  world  processes  in  domains  such  as  nuclear 

power,  environmental  safety,  and  the  like,  accentuate  the  incomplete  nature  of 

such  representations.  Indeed,  Ellsberg  pointed  out  the  particular  importance  of 

( 

ambiguity  in  understanding  people's  reactions  to  new  technologies  (also  see, 
Edwards  &  von  Winterfeldt,  1982,  for  a  historical  look  at  reactions  to  earlier 
technological  innovations).  In  any  event,  the  neglect  of  ambiguity  in  theories 
of  risk  is  slowly  giving  way  to  interest  at  both  the  formal-axiomatic  level 

•e 

(e.g.,  Fishburn,  in  press,  1983;  Gardenfors  s  Sahlin,  1982;  in  press;  Morris, 
1983)  as  well  as  the  psychological  level  (Lopes,  1983). 
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Prom  the  perspective  of  this  paper,  the  link  between  inference  and  choice 
via  ambiguity,  provides  a  unified  treatment  of  uncertainty  that  has  been 
largely  missing  from  current  theories  of  risk.  Moreover,  our  experimental 
results,  in  which  choices  were  predicted  from  knowledge  of  the  parameters 
obtained  in  the  inference  task,  suggest  that  the  process  that  affects 
inferential  judgments  is  also  present  in  choices  between  ambiguous  and  non- 
ambiguous  options.  We  should  emphasize,  however,  that  we  have  not  provided  a 
theory  of  risk.  In  particular,  we  have  not  treated  the  payoff  or  utility  side 
of  the  issue.  However,  we  expect  that  ambiguity  will  interact  with  factors 
such  as,  whether  payoffs  are  gains  or  losses  (Kahneman  &  Tversky,  1979),  the 
absolute  size  of  payoffs,  and  the  type  of  conflict  in  the  gamble.  These 
issues  await  further  research. 

Extensions  to  Multiple  Sources  and  Time  Periods 

In  order  to  examine  inferences  under  ambiguity  in  depth,  we  have 
restricted  ourselves  to  how  evidence  from  a  single  source  is  evaluated  at  one 
point  in  time.  However,  consider  the  more  realistic  situation  where  decision 
makers  receive  information  from  multiple  source-types  (including  base  rates) 
over  multiple  time  periods.  The  aggregation  of  information  over  source- types 
and  time  can  be  conceptualized  by  an  "evidence  matrix”  that  has  source-types 
for  rows  and  time  periods  for  columns.  Such  a  matrix  is  shown  in  Figure  6. 

Insert_Figure_6  about  here 

The  entries  in  each  cell  of  the  matrix  reflect  the  conflicting  evidence 
received  from  a  source-type  in  that  period.  The  matrix  provides  a  simple  yet 
powerful  way  to  look  at  a  wide  variety  of  inference  problems.  In  particular, 
by  focusing  on  source-types  (rows)  or  time  periods  (columns),  one  can  look  at 


the  combining  of  information  either  longitudinally,  cross-sectionally,  or 
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both.  Furthermore,  the  issues  of  reliability  and  ambiguity  become  quite 
complex  here  since  there  can  be  differential  source  reliability,  varying 
numbers  of  reports  per  source,  and  the  sources  may  not  be  "independent.” 

While  the  challenge  of  understanding  how  people  incorporate  such  factors  into 
their  judgments  is  formidable,  the  complexity  of  inferences  in  real  world 
settings  requires  that  attention  be  paid  to  these  issues. 

CONCLUSION 

In  considering  the  role  of  ambiguity  and  uncertainty  in  inferential 
judgments,  we  have  developed  a  quantitative  model  that  accounts  for  much 
existing  data  as  well  as  our  own  esqperimental  findings.  Furthermore,  we  have 
shown  how  this  model  relates  to  Keynes'  idea  of  the  weight  of  evidence,  the 
non-additivity  of  complementary  probabilities,  risky  choice,  and  current  work 
on  cognitive  heuristics.  Moreover,  since  inference  involves  "going  beyond  the 
information  given"  (Bruner,  1957),  an  important  way  do  this  is  to  construct,  via 
imagination,  "what  might  have  been"  or  "what  might  be."  Such  constructions, 
whether  the  result  of  a  cognitive  simulation  process  as  proposed  here,  or  more 
elaborate  processes  (as  in  resolving  surprise),  pose  an  interesting  and  important 
trade-off  for  the  organism.  On  the  one  hand,  there  are  costs  of  investing  in 
imagination;  increased  mental  effort  and  the  discomfort  that  results  from  greater 
uncertainty.  On  the  other  hand,  the  benefits  of  considering  the  world  as  it 
isn't,  protects  one  from  overconfidence  and  its  nonadaptive  consequences.  Thus, 
finding  the  appropriate  compromise  between  "what  is"  and  "what  might  have  been" 
(or,  "what  might  be"),  is  central  to  inferences  under  ambiguity  and  uncertainty. 
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^At  p^  ■  0,  1  there  is  no  ambiguity.  Hence,  the  relation  between  p£ 
and  S(p&)  should  be  discontinuous.  Indeed,  the  lack  of  ambiguity  at  the  end 
points  provides  a  rationale  for  the  discontinuity  of  the  decision-weight  function 
and  this  implies  the  "certainty  effect"  of  prospect  theory  (i.e.,  the  value  of 
sure  gambles  is  heightened  either  positively  or  negatively). 

2A  listing  of  the  program  is  available  from  the  authors. 
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APPENDIX  A 

This  appendix  considers  the  effects  of  different  assumptions  concerning  the 

weights  given  to  greater  and  smaller  values  than  that  observed.  In  equation  (5), 

differential  weighting  is  achieved  by  the  $  parameter;  i.e.,  k  =  0 (1 -p)  and 

9 

a 

k  »  0p  .  However,  one  could  also  consider  linear  weighting  schemes  where  the 
s 

weights  given  to  @p  and  0(1 -p)  sum  to  one  (i.e.,  a  weighted  averaging 
process),  or  where  the  weights  do  not  sum  to  one.  For  the  former,  let 

k  -  k  ■  ©w(1-p)  -  0(1 -w)p 

-  e(w-p>  lA*1) 

where  0  <  w  <  1  is  the  relative  weight  given  to  greater  values.  Substituting 
(A.1)  into  equation  (8),  we  obtain, 

S1 (f:c)  ■  p  +  ~  (w-p)  (A. 2) 

where,  Sj(f:c)  is  used  to  denote  alternative  model  1.  Note  tnat  in  this  model, 
S1(f:c)  is  regressive  with  respect  to  p.  Although  this  model  has  appealing 
features,  it  is  easy  to  show  that  it  does  not  capture  some  aspects  of  our  model 
and  data.  Specifically,  it  always  predicts  additivity  of  judgments  of  comple¬ 
mentary  events,  i.e., 

S. (f:c)  ♦  S. (csf )  *  p  +  ^  (w-p)  +  (1-p)  +  -^  [ (1-w)  -  (1-p) J 

.  ,  (A. 3) 

However,  non-additivity  will  occur  if  the  weights  accorded  to  0(1 -p)  and  0p 
do  not  sum  to  one.  A  special  case  of  this  model,  which  we  denote  S2(f:c),  and 
which  is  similar  to  the  S(f:c)  model  used  in  the  paper,  is  one  where. 


This  yields 


k  -  k  ■  0(1-p)  -  0mp  (m  >  0) 

g  s 


(A. 4) 


S_ (f :c)  =  p  +  —  (1-p  -  mp] 
i  n 


(A. 5) 


such  that  the  additivity  conditions  are, 

0  0 

S,(f:c)  +  S_(c:f )  -  p  +  —  [1-p  -  mp]  +  (1-p)  +  -  [p  -  m(1-p)3 
^  •  n  n 

-  1  +  -  (1-m)  (A. 6) 

n 

Thus,  for  m  >  0,  the  model  predicts  sub-additivity;  for  m  =  1,  additivity; 

and  for  m  <  0,  super -additivity.  The  difference  between  S  (f:c)  and  s(f:c) 

2 

is  that  the  former  predicts  a  constant  amount  of  non-additivity  irrespective  of 
the  value  of  p.  In  the  S(f:c)  model,  the  level  of  p  affects  the  amount  of 
additivity.  This  is  shown  in  equation  (9),  which  is  reproduced  here  for 
convenience, 

S(f:c)  +  S(ctf)  -1+1  [1-p*  -  (1-p)0]  (A. 7) 
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